Earlier in our audience series, we defined the various types of data. What we haven’t yet explored is the deterministic and probabilistic data models that are used to produce and analyze this audience data.
What is deterministic data modeling?
Deterministic modeling relies on definitive proof of a user’s identity, such as through a user login. This means that the majority of first party publisher data falls in the deterministic category. The reason first party data is so valuable is because it can be determined true or false. If a publisher possesses data about a user through a login, the publisher can definitively identify the user next time he or she visits or logs in.
Another key benefit of deterministic modeling is the implication for cross-device tracking. Whether a user is logged in on their phone, tablet or laptop, a publisher or brand definitively recognizes that user across devices and can provide a holistic, rather than a fragmented, user experience.
What is probabilistic data modeling?
Probabilistic modeling is much more complex and nuanced in the way it identifies a user as it relies, as the name suggests, on probability. This data is generated through collecting anonymous data points from a user’s browsing behavior and comparing them to deterministic data points. Probabilistic data modeling identifies users by matching them with a known user who exhibits similar browsing behavior. By aggregating these data points and plugging them into deduplication algorithms, detailed audience profiles can be achieved from incomplete information.
For example, lifestylewebsite.com is able to recognize Angie because she has a user login to the site. Angie frequently browses the fashion content of lifestylewebsite.com and other fashion sites. From this information, lifestylewebsite.com is able to indentify other fasionistas like Jennifer and Lauren because they exhibit similar browsing behavior. This allows lifestylewebsite.com to operate under the assumption that Jennifer and Lauren share other demographic, psychographic or interest-based traits and characteristics as Angie, which in turn allows lifestylewebsite.com’s advertisers to reach more of their desired audience.
Deterministic vs. probabilistic
For obvious reasons, deterministic may seem like the better option since the goal of collecting data is to always come as close as possible to identifying who your audience is. However, that does not mean that probabilistic isn’t valuable. Probabilistic data offers the element of scale. Although it is not certain that you are reaching your exact user or household you desired, it is likely and your best bet when a deterministic match is not available.
Now that we’ve covered the different types of data modeling, next week we’ll explore the differences between audience buying and contextual buying.
Read previous posts from our audience series:
- What Are Cookies and How Do They Work on Desktop Vs. Mobile?
- What is First Party Publisher Data?
- What is First Party Advertiser Data?
- What is Second Party Data?
- What is Third-Party Data?
- What is Deterministic and Probabilistic Data Modeling?
- The Skinny on Audience Buying and How it Differs from Contextual
This article was written by Lexie Pike, product marketing manager at SpotX.