Q&A with Yaki Umflat, Data Scientist at Pipl
Yaki Umflat is Pipl’s resident data scientist. His work at Pipl focuses on ways to improve the quality of the search results provided in response to users’ search queries. Before working at Pipl, Yaki worked in the finance and defense industries where he gained experience as a quantitative researcher and algorithm designer. We recently spoke with Yaki to help our readers understand what data scientists do and how they can improve our users’ experience with Pipl products.
Ronen Shnidman: What is a data scientist?
Yaki Umflat: A data scientist is someone, typically with an analytical background, who conducts research backed up by a large volume of data. In my case, I have a Bachelor’s degree and Master’s degree in electrical engineering, but many data scientists have backgrounds in mathematics, physics and chemistry. Data scientists are usually people with a STEM (science, technology, engineering or math) background because you need to know a lot of math and to know how to program. The programming languages data scientists use are mainly easier-to-handle scripting languages like R, Python or MATLAB.
RS: How do data scientists differ from data analysts and programmers?
YU: The key difference between a data scientist and a data analyst is programming knowledge. Most data analysts would usually stop at Excel and wouldn’t know how to use more elaborate tools like machine learning. For the questions that data analysts seek to answer that might be enough.
However, computer programmers typically focus exclusively on the design and implementation of computer software and not statistics and ways to manipulate data. Data scientists share more in common with statisticians, but the latter are usually in more academic positions, and the profession of quantitative researcher and data scientist are practically the same.
RS: What do data scientists do?
YU: It would be fair to characterize data scientists as efficiency experts, although I think data science can also lead to innovation beyond improving efficiency.
Today, we generate a huge amount of data with everything we do. Not long ago, a website owner might know just the number of visitors he had to his website. Now, a website owner knows how much time readers spend on each webpage. He knows how much time readers spend reading an article or blog post and where users’ moved their mouse on the page during their visit. A data scientist can take this information and utilize it to help the website provide a better user experience.
The same is true for software-based services. A data scientist can use the massive amount of user data created when using the software to build a better or more efficient service model. Although data scientists can sometimes help develop an idea for a new product.
RS: Can you name a product that data scientists help create?
YU: Netflix used to hold a competition for data scientists to create a better way to determine the future value users would assign to a TV series or movie. Netflix then used this algorithm as basis for its decision to produce the U.S. remake of the TV show “House of Cards.” The company determined using its own subscriber data that viewers of the original British show also liked watching movies directed by David Fincher or films starring Kevin Spacey. So when the remake was pitched with Fincher and Spacey attached, Netflix decided to produce it.
RS: What problems do you work on at Pipl?
YU: The main goal of our technology is to identify the person for whom our user is searching and provide all relevant data about them from the web. The main challenge is creating an algorithm that returns all the data and only the data that belongs to a specific person among the billions of records that exist. At Pipl our algorithm uses a special statistical model to determine if a piece of data belongs to the specific person desired. Improving this statistical model is a never-ending quest and one of my main responsibilities.
RS: What do you think is the most interesting use of people data?
YU: I think the most interesting application of people data is for fraud detection. When combined with behavioral data, IP addresses and other types of data, people data can allow risk professionals to implement very sophisticated methods for detecting fraud.
RS: What are the main difficulties with using people data?
YU: The main problem with people data is that it loses its value quickly over time. This month’s data is worth more than last month’s data, which is worth significantly more than last year’s data, etc. The problem in this area is usually a lack of high-value data, not too much of it.
RS: How will Big Data change the way businesses operate in the next 5 years?
YU: I think in the future businesses will optimize more products based on the preferences of their users. When you visit websites you will increasingly find that their personalized to appeal to your tastes and usage habits. I also believe that businesses will increasingly require more hard data before embarking on product updates and launches.
RS: When would it be preferable to make a decision based on instinct instead of data?
YU: I think data in many instances will drive you to do the same thing but better. Let’s says you are a TV show producer and there is a good zombie show right now on television. If you ask a data scientist what type of show they should produce next, they will analyze all the data on viewers’ preferences and tell you, “Zombies are very popular right now.” They won’t be able to tell you that the next big TV show will be about aliens if there wasn’t a TV show about aliens when the viewer data was gathered.
Data science is much less capable of identifying a revolutionary or paradigm-shifting innovation. If you are considering doing something that is not mainstream at the moment, then going with your gut may be better than using existing data as the basis for your decision.
More generally, Big Data is useful to identify trends for industries that cater to or rely upon people. There might be a few exceptions, but I think most interesting Big Data solutions are about people.
Original blog post written by former Pipl Technology Evangelist Ronen Shnidman. Ronen is now Managing Editor @ about-fraud.com