Post

Avatar
This little two-step is really something. Running up to the launch of GPT-4, he was heralding an astonishing, transformative advance, the harbinger of some cosmic intelligence. Now we all see it’s mostly useless,so… hang onto your hats for the real Great Leap Forward! What a total scam.
'GPT-4 is the dumbest model any of you will ever have to use' declares OpenAI CEO Sam Altman as he bets big on a superingtelligencewww.tomsguide.com Sam Altman talks at Stanford.
Avatar
Every time there’s a shiny demonstration of some new AI regurgitation engine, the zealots always say “and this is the worst AI is ever gonna be!” But it’s not! It really might not get substantially better and could definitely get worse.
Avatar
Sam and Co have basically admitted that they’re out of data and need 5% of world GDP and cold fusion to really make a go of it. Plus the web is filling up with LLM outputs, which they can’t detect, and they’re scraping and training on that shit, which WILL make it worse.
Avatar
The whole barely adequate current thing would not be possible without wholesale IP theft and they’re basically just praying that they’re immunized by the world historical scale of their crimes and don’t get sued into cinders.
Avatar
But just you wait until the barely incrementally better version, which we spent the GDP of Morocco, ten million hours of hidden underpaid Filipino labor, and half the water in the Colorado river training. Just you wait!
Avatar
I’m ranting like @zitron.bsky.social but good lord the millenarian bullshit coming out of these people just to trick some credulous CTOs into exorbitant Azure contracts really chaps my hide.
Avatar
Their language around this stuff is so teleological that it totally gives the game away
Avatar
There's a sucker born every nanoseconnd.
Avatar
And it’s all such obvious bullshit and so many idiots still keep falling for it.
Avatar
Did we ever get a good read on what happened the weekend they locked him out? There were some really cult-like stories going around.
Avatar
No need to apologize, true facts deserve to be shouted from the rooftops
Avatar
Avatar
Well Altman is half right anyways; GPT4 is dumb.
Avatar
I don’t understand how you think AI is “barely adequate.” At what? If its answers can’t be trusted (and they can’t), it has a negative value in every possible intended use case, doesn’t it? It’s only positive value comes from the fact that I needed a laugh.
Avatar
If you just need to generate plausibly coherent lorem ipsum text to fill out, say, a web form, it seems like it might be useful for that.
So Utah, having passed a transphobic bathroom bill, has launched an online form for people to snitch on folks they think are in the "wrong" bathroom or locker room. Be a real shame if people on the Internet flooded it with fake reports: ut-sao-special-prod.web.app/sex_basis_co...
Hotline Complaint Formut-sao-special-prod.web.app
Avatar
Why would I ever need to— Oh! Yeah, I could see that.
Avatar
Avatar
Already produced more valuable output with 1/100,000 the budget…
Avatar
Powering the World since the 1970s!
Avatar
all the data scale in the world (or, I mean, greater data scale than is available in the world) wouldn't do it. The stuff it can't do now is largely because it's missing the bits to do that, "just throw more unstructured data at it" is pretty well at asymptote
Avatar
Avatar
either I am very dumb (possible) or the "data" includes, like, the collected works of Gateway Pundit? i mean just feeding it more blahblahblah doesn't make it "smarter"
Avatar
the collected works of gateway pundit are actually sort of okay training data for what LLMs actually are, it's just that what they are is inevitably going to be bad at all kinds of tasks because the representational structures for doing those tasks doesn't really exist in written human language
Avatar
yah but even the very simple things like "Give me a short biography of Barack Obama" isn't helped by loading it up with nonsense that includes "barack obama was born in indonesia and/or kenya"
Avatar
Yes it’s effectively averaging out all the claims on the internet about Obama. The more disinfo there is, the more likely it is to repeat it.
Avatar
i wonder how they train this, it will give you whatever answer is near the vibes of your question, which is maybe worse
Avatar
it has no idea what is true or not and adding that facility is plausibly unsolvable but even gateway pundit moooost of the things they type on a sentence-to-sentence basis are, like, coherent english sentences; in the insane scale of these corpora it's okay-enough
Avatar
Also - it’s almost always revealed to have a human hand involved. In other words part of the fancy AI isn’t just Langchain and statistics, the engineers had to go in and code the guy’s head because it kept popping off or tell it there is a country in Africa that starts with ‘K’.
Avatar
Yeah it’s mechanical Turks all the way down
Avatar
Hahahaha. Elon Musk’s dancing robot. ‘This isn’t a real robot, but you get the idea.’
Avatar
😂 truly Fucking Amazon stores and so on
Avatar
Plus the original natural language corpus was all ill got at the expense of privacy rights.
Avatar
Avatar
Yup, the more AI regurgitated-crap fills the internet, the worst the new AI regurgitating models will get. It's a vicious cycle of ever-worsening vomit. The internet was such a useful thing twenty years ago...
Avatar
Avatar
It’s not the case that adding synthetic data necessarily makes models worse. People outside the field love this premise, but it’s behavior that only shows up in experiments that make the most pessimistic assumptions (eg earlier data must be discarded at every generation) arxiv.org/abs/2404.01413
Is Model Collapse Inevitable? Breaking the Curse of Recursion by...arxiv.org The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent...
Avatar
There’s a whole universe of “worse” short of “collapse”
Avatar
I appreciate your correctives, Ted. However, here the model not collapsing (with accumulation) is hardly good news, right? In fact, loss still happens, just not catastrophically (right side of figure). One can't seem to say from this paper that quality improves w accumulation but i've only glanced.
Avatar
No, that’s right — but they’re making no effort to improve the model. When synthetic data is produced to improve performance, it’s designed and filtered. It’s not “just train a model on its own output.” The question being tested here is simply, “if that does happen, does it produce collapse?”
Avatar
Ok, but two different scenarios are being suggested here. You're suggesting folks intentionally building & useing synthetic data to supplement a training set. The initial reply suggested one where synthetic and real text was being hoovered up in one big training set. Doesn't that make a difference?
Avatar
The paper tests the second scenario, the pessimistic one, and shows that as long as some real data is retained you don’t get collapse. “Retained” could mean “hoovered” or “held over from 2023.”
Avatar
The superintelligence has Kuru.
Avatar
filling the web with ai garbage is part of the plan. the machines are going to make us dumber so it's easier to pass the turning test.
Avatar
Avatar
And Google is saying that you can ground your Vertex AI app in Google Search 🤪
Avatar
It used to be hot garbage. Now it will be cold garbage.
Avatar
The only way to train AI is to have human created content flagged so it can tell the difference which means we can also use that flag to ignore AI outputs.