Joe Bak-Coleman

Profile banner

Joe Bak-Coleman

@jbakcoleman.bsky.social

Associate Research Scientist at Columbia journalism. Harvard BKC affiliate. Comp. soc. science, collective behavior, stats
Avatar
Avatar
Moved my stuff out of my office today, entering a strange interstitial place between one position ending and hopefully the next. The stack of books on my coffee table awaiting a home on a shelf was a great reminder of how awesome the past decade in science has been.
Avatar
Tentatively it *should* look something like this. Much more consistent with ML2 Table 2... ~4-5 hypotheses are consistent with FP properties but *many* of those that failed to replicate are not.
Avatar
Avatar
Check higher up in the paper as well. This isn't *our* definition of a false positive, this is the definition in the literature. Misleading about explanatory power is an informal definition (which is fine!) but informal definitions don't create testable claims about the abundance of FP in the lit.
Avatar
We address this in the discussion, this is how commonly cited models define a false-positive and many inferences about the replication crisis are drawn from those models. But what does it mean to you for a hypothesis to be false?
Avatar
Why does all this matter? What's the practical difference between a tiny effect and a perfectly zero one? Our model suggests significance is more common, which means smaller file-drawers and fewer QRPs. With N=100, significance should occur ~40% of the time (it's < 10% with a FP model).
Avatar
It also suggests that high rates of replication can be achieved by simply increasing *replication* sample size. In a sense this is a trivial solution to the replication crisis but achieves its goals only by dredging up significance for tiny and perhaps meaningless effects.
Avatar
And we might not want to just increase sample size for the sake of replicability. Doing so increases sign error, although it will decrease magnitude error. Large-N research may be significant nearly all the time and highly replicable, but more likely to get the direction wrong...
Avatar
Applying our model to replication efforts, we find that variation is rates of significance for replication efforts is almost entirely explained by the **replication** sample size. OSC 2015 used N=71, low replicability. Soto, Protzko etc.. used N>1000.... high replicability!
Avatar
If you're wondering why your favorite many labs study isn't in that chart above, there's two key things: Meta-analytic rates of multi-site replicability are distinct from single-shot estimates and each of the ML studies had sampling frames intentionally including strong/weak effects. Check the SI
Avatar
One study, ML5 is worth talking about because it was intended to address criticisms that small sample sizes explain low rates of replication. They used N=500 and retested failed reps from OSC 2015. Our model shows their outcomes are entirely consistent with rep N determining sig. rates.
Avatar
We develop a minimal alternative model. We assume effect sizes are normally distributed as a result of sampling error, variation in studied effect sizes, and their heterogeneity. We model the chance of getting significance, replication, sign and mag. error.
Avatar
A corollary of this is that if a given hypothesis is a *formal* false positive, it should be significant in 5% of multi-site replications. This is easily checked, and we show rates of significance for multi-site replications are inconsistent with false positives causing failed replications.
Avatar
Avatar
Touchable exploration of the what?
Avatar
Congress gloating over this seems weird, no?
Avatar
Billionaires bullshit in bus lanes.
Avatar
Avatar
I hope they copy my thesis next, particularly chapter 3 (I may have uploaded an old pdf by accident and keep forgetting to fix it)
Avatar
Holy shit their website is a hoot. I love that alongside their description of being a K-mart Cambridge Analytica they feel the need to note they're GDPR compliant.
Avatar
Reskeet with a hobby that helps your mental health.
reskeet with a hobby that helps your mental health
Avatar
A fine start to a Sunday. Butt (left) and chuck (right).
Avatar
Fantastic paper! There's just so much that can be done here. We even have a book in the works on the topic as it relates to collective behavior.
Avatar
Many are sparse... no real good way to link this to their work and claims without manually going through a *lot* of preprints, papers, OSF repositories, etc.. with tons embargoed, private or wholly missing.
Avatar
An interesting bit of discourse on the bad place between Craig Sewell, @stephenjwild.bsky.social and myself about Haidt's findings warrants a little thread here. The topic is this plot ostensibly showing an increase in self-harm hospitalizations for 14-18 y/o's. Obviously big if True..
Avatar
An obvious problem with Haidt's graph is that it doesn't account for uncertainty. You can add the raw yearly uncertainty back in and you wind up with Craigs graph which tells a less compelling story.
Avatar
Once we've added some uncertainty, we're modeling not plotting data. In this case we're assuming each year is independent and we wind up with quite wide estimates, undermining the ability to draw conclusions. @stephenjwild.bsky.social put together a spline fit to demonstrate the difference.
Avatar
My contribution was a simple gaussian process model which enables us to extract a linear trend. It seems to suggest an increase, but by no means a slam dunk. Not a perfect model and doesn't directly get at the smartphone hypothesis, but demonstrates another way of viewing the data.
Avatar