Joe Bak-Coleman

@jbakcoleman.bsky.social

1257

Followers

328

Following

477

Posts

Associate Research Scientist at Columbia journalism. Harvard BKC affiliate. Comp. soc. science, collective behavior, stats

Posts Replies Media

Reply to

Mark Rubin

Moved my stuff out of my office today, entering a strange interstitial place between one position ending and hopefully the next. The stack of books on my coffee table awaiting a home on a shelf was a great reminder of how awesome the past decade in science has been.

Reply to

Joe Bak-Coleman

Tentatively it *should* look something like this. Much more consistent with ML2 Table 2... ~4-5 hypotheses are consistent with FP properties but *many* of those that failed to replicate are not.

Banger

Check higher up in the paper as well. This isn't *our* definition of a false positive, this is the definition in the literature. Misleading about explanatory power is an informal definition (which is fine!) but informal definitions don't create testable claims about the abundance of FP in the lit.

Reply to

Paul Smaldino

We address this in the discussion, this is how commonly cited models define a false-positive and many inferences about the replication crisis are drawn from those models. But what does it mean to you for a hypothesis to be false?

Reply to

Joe Bak-Coleman

Why does all this matter? What's the practical difference between a tiny effect and a perfectly zero one? Our model suggests significance is more common, which means smaller file-drawers and fewer QRPs. With N=100, significance should occur ~40% of the time (it's < 10% with a FP model).

It also suggests that high rates of replication can be achieved by simply increasing *replication* sample size. In a sense this is a trivial solution to the replication crisis but achieves its goals only by dredging up significance for tiny and perhaps meaningless effects.

And we might not want to just increase sample size for the sake of replicability. Doing so increases sign error, although it will decrease magnitude error. Large-N research may be significant nearly all the time and highly replicable, but more likely to get the direction wrong...

Reply to

Joe Bak-Coleman

Applying our model to replication efforts, we find that variation is rates of significance for replication efforts is almost entirely explained by the **replication** sample size. OSC 2015 used N=71, low replicability. Soto, Protzko etc.. used N>1000.... high replicability!

If you're wondering why your favorite many labs study isn't in that chart above, there's two key things: Meta-analytic rates of multi-site replicability are distinct from single-shot estimates and each of the ML studies had sampling frames intentionally including strong/weak effects. Check the SI

One study, ML5 is worth talking about because it was intended to address criticisms that small sample sizes explain low rates of replication. They used N=500 and retested failed reps from OSC 2015. Our model shows their outcomes are entirely consistent with rep N determining sig. rates.

Reply to

Joe Bak-Coleman

We develop a minimal alternative model. We assume effect sizes are normally distributed as a result of sampling error, variation in studied effect sizes, and their heterogeneity. We model the chance of getting significance, replication, sign and mag. error.

Reply to

Joe Bak-Coleman

A corollary of this is that if a given hypothesis is a *formal* false positive, it should be significant in 5% of multi-site replications. This is easily checked, and we show rates of significance for multi-site replications are inconsistent with false positives causing failed replications.

About as promising as phase II can get. www.gilead.com/news-and-pre...

Touchable exploration of the what?

Congress gloating over this seems weird, no?

Billionaires bullshit in bus lanes.

Reply to

tj mahr

Aphex twin.

Reply to

Jess McLaughlin, PhD (they/them)

I hope they copy my thesis next, particularly chapter 3 (I may have uploaded an old pdf by accident and keep forgetting to fix it)

Reply to

Joe Bak-Coleman

Holy shit their website is a hoot. I love that alongside their description of being a K-mart Cambridge Analytica they feel the need to note they're GDPR compliant.

Reskeet with a hobby that helps your mental health.

reskeet with a hobby that helps your mental health

A fine start to a Sunday. Butt (left) and chuck (right).

Reply to

Marlen Z. Gonzalez, PhD

Fantastic paper! There's just so much that can be done here. We even have a book in the works on the topic as it relates to collective behavior.

Reply to

Joe Bak-Coleman

Many are sparse... no real good way to link this to their work and claims without manually going through a *lot* of preprints, papers, OSF repositories, etc.. with tons embargoed, private or wholly missing.

An interesting bit of discourse on the bad place between Craig Sewell, @stephenjwild.bsky.social and myself about Haidt's findings warrants a little thread here. The topic is this plot ostensibly showing an increase in self-harm hospitalizations for 14-18 y/o's. Obviously big if True..

An obvious problem with Haidt's graph is that it doesn't account for uncertainty. You can add the raw yearly uncertainty back in and you wind up with Craigs graph which tells a less compelling story.

Once we've added some uncertainty, we're modeling not plotting data. In this case we're assuming each year is independent and we wind up with quite wide estimates, undermining the ability to draw conclusions. @stephenjwild.bsky.social put together a spline fit to demonstrate the difference.

My contribution was a simple gaussian process model which enables us to extract a linear trend. It seems to suggest an increase, but by no means a slam dunk. Not a perfect model and doesn't directly get at the smartphone hypothesis, but demonstrates another way of viewing the data.

Reply to

Tiffany C. Li

Either tbh.