Towards a Regulatory Framework for Harmful Online Content: Evaluating Reasonable Efforts

1. Introduction

As I have written previously, content recommendation engines (CREs) like the Facebook newsfeed and YouTube’s “watch next” feature appear to be sometimes amplifying harmful content.  In a follow-up post, I advocated for a co-regulatory approach in which the companies behind these CREs would provide data to regulatory authorities to help ensure that they are taking appropriate measures to control the spread of harmful content.  In this post I discuss what data would be needed to evaluate whether or not the companies are taking reasonable efforts towards reducing the spread of harmful content.

I will roughly follow the framework developed by Joshua A. Kroll.  His paper presents a challenge to the argument that algorithms can be too complex to understand, writing that “by claiming that a system’s actions cannot be understood, critics ascribe values to mechanical technologies and not to the humans who designed, built and fielded them…  inscrutability is not a result of technical complexity but rather of power dynamics in the choice of how to use those tools.” By understanding CREs, we can understand their functions and the values they embody. This understanding can provide the basis on which to address the gap between private and public interests.

CREs are optimized for particular methods through experimentation and machine learning (ML) models.  As Kroll writes: ”Systems can be understood in terms of their design goals and the mechanisms of their construction and optimization.  Additionally, systems can also be understood in terms of their inputs and outputs and the outcomes that result from their application in a particular context.”  The correspondence here is clear: the “design goals” can be understood to be the metrics for which the algorithm is being optimized; the “inputs” correspond to the training data for an ML model or results of an experiment; the “outcomes” are simply the actual exposure of content to users.  All of these are concrete things that can be measured and evaluated.

2. Background

Much of the background needed to understand how to evaluate design goals, inputs, and outcomes is covered in my previous post, but we will cover all the essential points here.

Harmful Content

The way we think of harmful content is inherently subjective, but we can create useful and objective operational definitions, which we can use to design “rating policies” that allow a person to categorize content.  Then, we can estimate the amount of harmful content that users of a CRE are exposed to by having humans rate an appropriate sample of the content and subsequently use statistical methods to infer the overall rate. We could augment this process by ML methods that predict the probability for a particular piece of content to be rated as harmful.


A metric is a measurement of usage of a product.  One important class is growth metrics. They might count the numbers of users using the product on a particular day, the amount of time they’re spending in the product each week, or how likely they are to have continued using the product after a certain point.  Many different growth metrics are possible, but it suffices to say that they generally indicate how much a product is being used and are proxies for product success. Outside of growth metrics, it is important to note that we can also have metrics for harmful content, such as “on average, how many pieces of harmful content is a user exposed to each day they use the product”.

Optimizing CREs

CREs are adjusted in response to user data in two primary ways: ML models and experiments.  A ML model learns what types of content to recommend to a particular user based on information about the content and about the user.  The model must be optimized for some particular metric1 – for example, to maximize the amount of time that the user is likely to spend using the product after seeing the recommendation.  The model may be constantly updated as new user data becomes available, which is to say that it can learn every time a user sees a recommendation and chooses to spend some amount of time using the product afterwards (or maybe none).  Alternatively, the model may just be updated occasionally with a new batch of training data.

Experiments are actually quite similar.  In the most typical formulation, a modified version of the CRE may be tested against the existing (“production”) version.  A random set of users will begin to have their recommendations provided using the modified CRE. We can rate the performance of the modified and production versions based on selected metrics and update the production model with the best performing CRE.  A key point is that this decision is made based on a particular set of metrics that has been chosen by the CREs designers.

3. Applying the Kroll framework

Kroll suggests that we can understand an algorithm by understanding its goals, inputs, and outcomes.  I focus on the goals, as that is the most important element, but briefly consider the inputs and outcomes here.

The inputs require understanding exactly what training data is being used – what information is being collected about what population of users and what analysis is performed on the content (such as determining a video’s probable topic or demographic).  Does the system measure how long users spend on the site? Is there a way for users to rate the content? Are the ML models trained on all users, or are some users not represented?

The outcomes can be constructed quite broadly, but one important outcome to consider is how much harmful content the users are actually being exposed to.  I go into depth as to how this can be measured in my previous article.


We have seen that the design of a CRE is ultimately to optimize some set of metrics.  If we could directly measure how happy users are with the content they’re consuming, we might want to optimize for that – recommend to users whatever content will make them the happiest (for whatever definition of “happiness”).  However, it is generally not possible to directly measure how happy users are, so the people developing CREs use two types of proxies: implicit and explicit feedback. Explicit feedback means to simply ask the users how happy they are – for example, allowing users to rate a video, or press a button saying they would like to see more content similar to what they’ve just seen.  Implicit feedback is subtler: whether a user watches a video to completion, slows down their scrolling while the content is on the screen, or shares the content. These may indicate that the user liked the content, but the connection is tenuous.

We can say that explicit feedback is what the users are “saying they like”, while implicit feedback is what the users are “showing that they like”.  But the claim that liking a video is the only reason that a user might watch a video to completion is fallacious. As an example, a common mantra in the digital advertising world is that they are not focusing on making money, they are “trying to show users the ads that are most useful to those users”.  But they measure this usefulness by how many users click on the ad (incidentally, exactly how they make money). Realistically, there are many reasons a person might click on an ad that do not indicate that the ad was useful to that person. In this case, using indirect feedback (clicks on ads) to measure how useful the ads are to users is perhaps not working well and instead supporting the interests of the advertising network above those of the user.  As we see, of the two forms of feedback, implicit feedback is generally much better aligned with the business models of the companies behind the CREs: more content being viewed or shared means more opportunities to show advertising. In a previous post, I have discussed the potential conflict of interest that this creates.

So understanding CREs “in terms of their design goals” can be largely done through the metrics they are designed to optimize for.  A CRE that optimizes purely for the amount of time users spend on the site can be understood as such, despite any claims of trying to show users the content they’ll be “most interested in”.  The CRE, in this case, is showing whatever content will best get the users to spend more time in the product. Showing interesting content might be one way to achieve this, but it is not the only option and will not be preferred by the CRE.

To bring us back to the issue of harmful content, remember that these companies generally can estimate the amount of harmful content they are exposing their users to.  This can be used as a metric that can be optimized against to decrease exposure of harmful content. If the CRE is not designed with this goal , it is very difficult to argue that reasonable efforts are being made to prevent the spread of harmful content.  In other words, following Kroll’s argument, if preventing the amplification of harmful content is not one of the explicit design goals then the system as a whole is categorically not intended to prevent amplification of harmful content.  Facebook and YouTube both claim to be working to reduce harmful content on their platforms, but so far have provided no evidence of whether their CREs are really designed to do so.

In any case, the choice of metrics to optimize for will have powerful and complex impacts on users of the services and there should be real responsibility for the businesses behind the CREs to understand this.  Article 35 of the GDPR on data protection impact assessments provides a potentially useful model to be replicated, in that they define a structured and explicit approach for anticipating and taking action to minimise risk of harm.

4. Data

Now I can return to my original intention, to suggest what data companies should provide to an auditor in order to evaluate their efforts in preventing the spread of harmful content.  As well as the prevalence data I’ve written about previously, an auditor would need data on ML models and experimentation.

For ML models, there should be a list of all models that can have any potential impact on what content is recommended to users.  For each model, the metrics for which it is being optimized should be listed and well-documented. They should be clearly defined and a changelog of any modifications to the metric definitions should be available.  This will critically reveal the balance between explicit and implicit feedback that is being used, and whether exposure to harmful content is being used as a metric. Additionally, for each model, the training data should be clearly specified, including what population of users or events is included and what characteristics are used.

Similarly, for experiments, a log of all experiments that could potentially impact the content being recommended or viewed (this would include user-interface changes that might result in users clicking through to certain content at a different rate) should be provided.  No trade secrets are needed, just a brief description of the experiment’s purpose, what metrics were used to evaluate its outcome, which of these metrics were found to increase or decrease, and what decision was made for the change being experimented with.

Ideally, all this data should be available broken down for different geographic or demographic markets, so it can be determined if particular populations are being disproportionately harmed by a change.

5. Conclusion

The precise details of this sort of reporting would require an intensive co-development with the companies being regulated, but by adhering to the basic principles outlined here, a meaningful sort of transparency is possible that could incentivize the creation of CREs that better serve their users and communities.

6. References

Joshua A. Kroll. 2018. The fallacy of inscrutability.  Phil. Trans. R. Soc. A.37620180084

Jesse D. McCrosky. 2019. Unintended Consequences: Amplifying Harmful Content. Wrong, but useful.

Jesse D. McCrosky. 2020. Towards a Regulatory Framework for Harmful Online Content: Measuring the Problem and Progress. Wrong, but useful.

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Article 35.


    1. Technically this metric may be some composition of multiple metrics. For example, two metrics can be optimized for simultaneously but some weighting must be given to indicate which metric is more “important”. Similarly constraints can be specified, for example, a model might maximize for total time spent in the product subject to the constraint that no more than 1% of users are exposed to harmful content on a given day.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s