Blog

Data Outta Thin Air

Of the different metadata fields that show in a response, inferred is probably the least known. This indicator tells us whether the data returned comes from one of the billions of records located by the search engine or if it’s a statistical piece of data inferred using our proprietary algorithms.

Inferred Data in the People Data API

With the introduction of inferred data in the results of the API, a new mechanism was built to control the level of statistical probability of the inferred data that is returned. This mechanism is known as the “minimum probability” parameter (often confused with the unrelated “minimum match” parameter).

Minimum probability (defaulted at 0.9) defines the probability that the inferred data is right. So for example, if you get an inferred gender with the minimum probability at the default setting of 0.9, it means that the gender is true in at least 90% of the cases (based on the name of the person in this case). As you lower the minimum probability, you will get more and more inferred data that is likely to be true at least as the threshold level you defined. For example, inferred data returned from a request that had minimum probability set to 0.5, should be correct at least 50% of the time.

It must be noted that the minimum probability does not affect the response itself (i.e. whether or not a person is returned). It only generates statistical data in addition to the real data returned in the response.

How to use minimum probability

As I’ve pointed out earlier, minimum probability has a default setting of 0.9 and is used in any query, giving you some additional inferred data which can be seen in the response like this:

“@inferred: True”

A value of 0 is the lowest minimum probability value, and will allow the system to infer any data it can, even a whole person, while a value of 1 will disable the mechanism, and all the data in the response will include only data that is directly retrieved from  our data sources.

Inferred data points can include:

Origin countries – Usually returned with default setting

Languages – Usually returned with default setting

Gender – Usually returned with default setting

Job – Sometimes returned when minimum probability is set lower than default value.

Email – Sometimes returned when minimum probability is set lower than default value.

Generate missing emails

Now that you understand what inferred data is and how we create it, let’s put it to use. In most responses, you will see some inferred data like gender, origin countries and more. But to push the system further, you need to take it to the limit.

A fairly common  use case, is when a response returns a several emails, but none of them is from a current work place. Nowadays, people tend to change work places more often and it is becoming extremely difficult for any source to stay up to date with the most current work email.

Pipl can get you the person’s Linkedin profile where that person updated their current workplace, but how do you get in touch with them?  This is where inferred data comes in handy. To find their new work email, send the person’s details (the more details, the better) in the search query, including the person’s last known email address, if you have it, and ask for an inferred email from his latest workplace (as it appears in the person’s details).

The best way to do this is to set the minimum match to 0, set match requirements to new_email. Both parameters will make sure that you get a response only if it contains an email which is different from the one you send in the query, and if possible, the system will infer an email address based on the person’s company details.

So instead of trying to repeatedly guess a person’s current workplace email, the Pipl API will determine for you what is the most likely email address for that specific person working at that specific company.

Example

Below you can see an example of inferred emails based on Pipl’s statistical analysis. The first email is known to be related to a record in our index, and the other two were inferred d by the system:

"emails": [
{
"@valid_since": "2012-10-01",
"@email_provider": true,
"address": "clarkkent@hotmail.com",
"address_md5": "0ce999b1a9d2d467bc94556210fab0aa"
},
{
"@inferred": true,
"@email_provider": false,
"address": "clark.kent@savekypton.org",
"address_md5": "e4bc3a8b5a5332e2f55eb2552fbaaffb"
},
{
"@inferred": true,
"@email_provider": false,
"address": " clark.kent@dailyplanet.com",
"address_md5": "20f32b6d1eb91056ed25b5743c7d309d"
}
]

So instead of guessing email addresses, gender, location and even jobs, have the Pipl API to do it for you and get the data you need out of thin air.