The Benefits of Explainable Content Moderation


Moderating content is a delicate balancing act between ensuring lively debate and minimising harm; but by providing clear explanations when users overstep the mark, we can help improve long term behaviour, improving social groups for everyone.

Content moderation can often be framed as a “for the greater good” issue; those deemed in violation of nebulous guidelines have their engagement hindered or are outright banned in the interest of ensuring “healthy discussion”.

But we also need to be mindful of the second-hand impact these actions have. Platforms can get so caught up in moderating conversation that those whose content has been removed are left in the dark on why such a decision was made. This lack of clarity can then leave users feeling that the platform has a hidden agenda: suppressing dissenting voices and steering conversation, thereby leading to anger and resentment (West 2018).

A better approach can be found then in clearly explaining why a given piece of content had to be removed. Jhaver, Bruckman, and Gilbert explore this issue, examining the impact that explanations for why content has been removed have on user behaviour. By analysing millions of Reddit removal explanations spanning a wide range of communities, this paper sheds insight into how explainable moderation can improve user behaviour.

The authors note that much of the explanations given to offending users included examples of what the expected behaviour of the given community was. In practice this could be removing posts that did not accurately describe their content in the title or posts that duplicated existing content. By providing an exact reason for why a given piece of content was removed, this offers something that the user can actually learn from rather than an opaque “community guidelines violation” message.

That is not to say that providing an explanation will immediately improve discussion. The authors do note that in some scenarios providing an explanation for why content was removed could deter the user from posting in the future. Conversely, silently removing the offending content without explanation helped keep users engaged so long as they didn’t notice. This effectively results in a quality or quantity scenario where moderators can choose either livelier discussion from increased user submissions or higher quality discussion by carefully moderating what content stays up and what gets taken down.

Certainly some questions arise over how well this approach would translate to other sites. Given that Reddit is structured around topic-centred groups, it makes sense that moderators can provide explanations for when a user steps outside of expected behaviour. But for platforms with less clearly defined boundaries it’s not certain that a similar approach of clarifying what behaviour is expected would work - or even be possible.

Providing a clear reason to the user on why their content had to be removed or otherwise censored seems like such an obvious approach and yet it’s application is surprisingly rare. In fact, it appears that volunteer moderated sites (like Reddit) are the outlier; being more likely to provide actionable explanations for content removal than their corporate counterparts (Cook, Patel, and Wohn 2021).

Although it’s important to be mindful of how users interact with a given platform, making it clearer when behaviour steps outside accepted boundaries can help ensure healthier communities in the long term.

Further Reading

Does Transparency in Moderation Really Matter?: User Behaviour After Content Removal Explanations on Reddit

Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms

Commercial Versus Volunteer: Comparing User Perceptions of Toxicity and Transparency in Content Moderation Across Social Media Platforms