rahaeli: However, members of the group, who will have social connections within the group and have already formed opinions and reads on people in the group, will, always, need to compensate for the human tendency to read charitably when you agree with/like the speaker and uncharitably when you don't.

I'm not going to litigate the specifics of this situation, but there are some critical lessons here for people who are thinking of running a labeler (and to some extent they're the lessons of T&S in general, but they are even more important given the paradigm of composable moderation).

This thread covers the two fundamental things all labelers need to decide on up front and stick to: 1) Who is doing the moderation, what are their biases, and how are those biases mitigated? 2) Are you moderating/labeling objective actions/content, or subjective characteristics?

Each of these two points have a lot (and I mean A LOT) of nuance. (Like everything having to do with T&S!) Let's start with #1: bias mitigation. People who oppose community-driven moderation are now smugly parading around going "of course anyone who wants to be a mod is biased!"

This is the wrong way to look at it. It's not an inherent problem with community moderation: it's an inherent problem with people. Everyone is biased, in a million different ways. We all have our viewpoints of what we think is good vs bad.

Elon Musk thinks the word "cis" is a slur and should be moderated: that's a bias. I think people who create accounts only to advertise things are spammers and should be moderated: bias. You may think associating a wallet name with an account name is doxing and should be moderated: bias. Etc.

T&S, inherently, is a biased process: it involves someone's definitions of what should and shouldn't be actioned. There is no such thing as neutral, unbiased moderation. Anyone who says otherwise is simply asserting societal prejudices that are declared "objective" because of who holds them.

And, crucially, people don't want moderation to be "unbiased", or to fall back solely on externalities such as "is this content legal". Don't believe me? Look at the months-long Discourse on child safety: most of the content many people very loudly want removed is legal under US law.

What people are calling "bias" here, me included (because it's shorter), is actually better termed "viewpoint". Moderation is a function of viewpoint. You choose a viewpoint lens through which to moderate and apply it to your policies and actions.

The neat thing about Bluesky's experiment in composable moderation (which, as everyone who's been following me for ages knows, I am still dubious about the long term likelihood of success of, but this is *not* the reason why) is that you can pick which viewpoint you want to view the site through.

What people starting up labelers are going to have to do, though, is work out how to ensure the agents doing the work to action reports are going to apply *the labeling service's* viewpoint and not their own. This is an incredibly, incredibly difficult problem.

The fundamental tension here: a labeler with a strong viewpoint built from the (actual or perceived) consensus of a specific group as to what should be moderated will naturally want to draw its agents from members of that group, who have a familiarity with the group's social norms and practices.

This allows contextual interpretation of reported content. Failures of cultural competency result in problems where the members of the group can easily understand why a post should be moderated, but an outsider has no idea and thinks the post is inocuous. This happens *all the time*.

However, members of the group, who will have social connections within the group and have already formed opinions and reads on people in the group, will, always, need to compensate for the human tendency to read charitably when you agree with/like the speaker and uncharitably when you don't.

Let me be very clear here: this is not an individual failing of any specific person. It's fundamental human nature. You can compensate for it when you know the tendency exists, but you can never eliminate it. I do it. You do it. Every moderator ever has done it.

There are various process methods a team manager can use to compensate for it, over and above the methods individuals can use. People commonly propose a double-agree system, where two people have to sign off on an action. That can help, but is deeply impractical at any kind of volume.

You can do escalating levels of agreement needed for more severe actions; this might look like "single agree for labeling a post, double agree for labeling an entire account". But there are problems with that, too! First: 99.9% of whole account actions will be completely uncontroversial.

So you're *still* increasing your workload for no reason. Second, since 99.9% of actions are uncontroversial, the person doing the check is going to be strongly inclined to agree with the first person on the .1% too, because they're used to most of their decisions being right.

Our brains are, fundamentally, bad at spotting .1% events. Again: this is fundamental human nature! There's only so far you can process your way out of it. Third, if everyone on the team comes from the same group, they're more likely to have the same predispositions to read charitably/uncharitably.

You can set a policy that agents should not act on reports where they already had a pre-existing opinion on one party to the conflict. That gets you closer to fixing the problem, but it, too, has issues: for one, it's entirely self-reported and relies on agents being honest about recusal.

Bad enough in code review

Post