Your Trust & Safety Metrics Are Lying to You

This is the first in a series of guest posts by trust and safety experts and practitioners hosted by Pipl. Our first guest expert is Assaf Kipnis, a seasoned trust & safety professional with over a decade of experience, with a unique blend of military and tech industry expertise. Assaf has successfully led large-scale trust and safety operations, including investigative leadership and cross-functional collaboration at Meta. His career has been marked by a proactive approach to online security, encompassing network-based threat analysis, policy development, and engineering.

One of our greatest challenges in trust and safety is creating meaningful success metrics. Our measure of success is in preventing abuse. Essentially, we are tasked with measuring what didn’t happen. We need metrics to justify budgets, headcount, measure progress, efficiency and effectiveness. The need for measurement creates what I call the “whac-a-mole metric”. In our effort to measure our work, we rely on the most readily available numbers. These tend to be mostly enforcement numbers, such as the number of accounts or entities removed, disabled, or restricted by our efforts. 

The core perception of this metric can be boiled down to a single principal: 1,000 accounts removed > 100 accounts removed.

I argue that this type of metric creates a false narrative. Relying on enforcement metrics presents a compelling story for stakeholders, but does not accurately represent the reality and scale of the abuse on the platform. To answer why this is the case, we need to explore the rudimentary archetypes of attackers in the wild. It is important to clarify at this point that these archetypes are neither exhaustive nor mutually exclusive.

  • The opportunists/novice: Low sophistication actors using a limited number of accounts to make as much malicious impact as possible in a short amount of time. These actors run small scale operations and other limited scope attacks. They create their own accounts and handle their own infrastructure. 
  • The small/medium/up-and-comer organization: Medium sophistication actor groups working in sync. They spend money to make money, creating or buying fake and compromised accounts in bulk, as well as collaborating with other groups and offenders using an array of infrastructure and attack surfaces. This type of actor will strategically identify targets and run medium level tests to assess the target platform’s defensive posture. 
  • The large organization/criminal enterprise: Highly sophisticated actor groups working across multiple online and offline abuse types. This type of organization could employ smaller malicious organizations to maintain a flow of fake and compromised accounts. These organizations spend significant time and resources on the creation of the attack infrastructure, identifying vulnerabilities through reconnaissance activities, and executing their attacks.

How are the metrics lying to us? 

The goal of the adversarial actor is to get past our defenses. Once they are through, the actors are able to leverage their new posture to generate more significant impact. In the same way that we, as defenders, look to collect data to understand the attackers, so do the attackers collect data about us in order to make their attacks more successful. In the Trust & Safety space, most harm can be perpetrated using fake and compromised accounts. To best leverage these types of accounts, attackers need to have intimate knowledge of our mitigation and prevention tactics. 

A popular and effective way of testing defenses is to use “throw away” or “cannon fodder” accounts. Since generating or buying fake and compromised accounts could be done at relatively low cost, adversaries have the ability to hit our defenses with hundreds and thousands of cheap accounts. The enforcement actions we apply to these accounts provide the adversaries with critical information, exposing our defense postures as well as their gaps and weaknesses.    

When we count the number of assets which have been subject to enforcement, we equate enforcement numbers to the scale of abuse perpetrated and our level of success curbing this abuse. These are false equivalencies. 

Attackers of all levels expect to lose some of their assets while they test our defenses. As the sophistication level of the attacker increases, the removal of accounts en masse becomes less and less consequential. For large malicious organizations, it is “the cost of doing business”.  

Excessive focus on enforcement metrics may perpetuate a cycle where enforcement is prioritized solely to meet metric targets, rather than effectively reducing abuse, ultimately leading to a skewed reliance on enforcement actions. Moreover, when we measure enforcement, how are we to know if we are making a dent in the attack or just giving away our defensive posture? How do we know if we are actually reducing the level of abuse on the platform? 

If we don't measure enforcement, what do we go by? 

Understanding the attacker archetypes and their goals is a critical starting point. Recognizing that each account created to attack our users is likely related to a cluster of like or connected accounts is the first step in creating a holistic threat mitigation or threat disruption metric. Having the ability to clearly communicate the threats the company is facing allows us to explain both good and bad user journeys. The clarity we provide in the adversarial space allows us to identify the right data points and business buy-in we need. Depending on the maturity of your organization, you will be subject to different levels of adversaries. Working to gain a holistic understanding of your attacker and their modus operandi is the first step towards creating an adversarial threat taxonomy, which would allow us to define meaningful metrics. 

I suggest that instead of considering enforcement actions as a metric, we move to measuring our influence as an organization and a team. Influence comes in many shapes and sizes. What type of influence you will measure depends on the type of business your trust and safety organization resides in and the business’ goals. Once we have an understanding of the type of attacker/s we are dealing with, we can measure the holistic outcomes of mitigating or disrupting them. 

As I continue to develop my ideas around trust and safety metrics, I want to admit this is a hard problem without a clear, straightforward solution. Saying that, I would like to share my thoughts on how an organization can more effectively measure the influence of their efforts:

  • Reduction of bad actor engagement on the platform. This can be measured by collating the following:
    • Fluctuations in user reports on bad interactions
    • Counting “soft enforcements” on accounts (rate limits, interaction limits, etc.) 
    • Measuring the level of prior engagement produced by now-removed accounts 
  • Measure enforcement quality 
    • Appeal overturn rate which will allow us to get a view of false positive enforcement and good-user friction
    • Appeal rejection rate which will show a view of true positives that in turn shows impact on the adversary
  • Bad actor network impact
    • Measure the impact our large scale account enforcement has made on the volume and intensity of attacks on specific surfaces
  • Product impact
    • Measure the impact of our investigations on product change - which in turn can lead to a subsequent metric of how the product change led to a shift in user reporting

It is critical that we don’t view each of these options as a meaningful metric on its own. We must take a holistic view when it comes to measuring our work on metrics. This stands true whether you have access to all of the data mentioned above or just to parts of it. When we focus on a holistic view, we are able to correlate what seem to be distinct data points into a cohesive picture. Balancing these holistic metrics to paint a clear picture of how trust and safety supports growth goals (or at least doesn’t stand in the way) while protecting users would be an important step towards achieving better business alignment.

While I would not say I have trust and safety metrics figured out (not by a long shot), I believe moving away from whac-a-mole counting metrics will bring significant change to how trust and safety works and how we communicate impact to the business.