
This little two-step is really something. Running up to the launch of GPT-4, he was heralding an astonishing, transformative advance, the harbinger of some cosmic intelligence. Now we all see it’s mostly useless,so… hang onto your hats for the real Great Leap Forward! What a total scam.
'GPT-4 is the dumbest model any of you will ever have to use' declares OpenAI CEO Sam Altman as he bets big on a Sam Altman talks at Stanford.
Every time there’s a shiny demonstration of some new AI regurgitation engine, the zealots always say “and this is the worst AI is ever gonna be!” But it’s not! It really might not get substantially better and could definitely get worse.
Sam and Co have basically admitted that they’re out of data and need 5% of world GDP and cold fusion to really make a go of it. Plus the web is filling up with LLM outputs, which they can’t detect, and they’re scraping and training on that shit, which WILL make it worse.
The whole barely adequate current thing would not be possible without wholesale IP theft and they’re basically just praying that they’re immunized by the world historical scale of their crimes and don’t get sued into cinders.
But just you wait until the barely incrementally better version, which we spent the GDP of Morocco, ten million hours of hidden underpaid Filipino labor, and half the water in the Colorado river training. Just you wait!
I’m ranting like but good lord the millenarian bullshit coming out of these people just to trick some credulous CTOs into exorbitant Azure contracts really chaps my hide.
Their language around this stuff is so teleological that it totally gives the game away
There's a sucker born every nanoseconnd.
And it’s all such obvious bullshit and so many idiots still keep falling for it.
Did we ever get a good read on what happened the weekend they locked him out? There were some really cult-like stories going around.
No need to apologize, true facts deserve to be shouted from the rooftops
Well Altman is half right anyways; GPT4 is dumb.
I don’t understand how you think AI is “barely adequate.” At what? If its answers can’t be trusted (and they can’t), it has a negative value in every possible intended use case, doesn’t it? It’s only positive value comes from the fact that I needed a laugh.
If you just need to generate plausibly coherent lorem ipsum text to fill out, say, a web form, it seems like it might be useful for that.
So Utah, having passed a transphobic bathroom bill, has launched an online form for people to snitch on folks they think are in the "wrong" bathroom or locker room. Be a real shame if people on the Internet flooded it with fake reports:
Hotline Complaint
Why would I ever need to— Oh! Yeah, I could see that.
Already produced more valuable output with 1/100,000 the budget…
Powering the World since the 1970s!
all the data scale in the world (or, I mean, greater data scale than is available in the world) wouldn't do it. The stuff it can't do now is largely because it's missing the bits to do that, "just throw more unstructured data at it" is pretty well at asymptote
either I am very dumb (possible) or the "data" includes, like, the collected works of Gateway Pundit? i mean just feeding it more blahblahblah doesn't make it "smarter"
the collected works of gateway pundit are actually sort of okay training data for what LLMs actually are, it's just that what they are is inevitably going to be bad at all kinds of tasks because the representational structures for doing those tasks doesn't really exist in written human language
yah but even the very simple things like "Give me a short biography of Barack Obama" isn't helped by loading it up with nonsense that includes "barack obama was born in indonesia and/or kenya"
Yes it’s effectively averaging out all the claims on the internet about Obama. The more disinfo there is, the more likely it is to repeat it.
i wonder how they train this, it will give you whatever answer is near the vibes of your question, which is maybe worse
it has no idea what is true or not and adding that facility is plausibly unsolvable but even gateway pundit moooost of the things they type on a sentence-to-sentence basis are, like, coherent english sentences; in the insane scale of these corpora it's okay-enough
Also - it’s almost always revealed to have a human hand involved. In other words part of the fancy AI isn’t just Langchain and statistics, the engineers had to go in and code the guy’s head because it kept popping off or tell it there is a country in Africa that starts with ‘K’.
Yeah it’s mechanical Turks all the way down
Hahahaha. Elon Musk’s dancing robot. ‘This isn’t a real robot, but you get the idea.’
😂 truly Fucking Amazon stores and so on
Plus the original natural language corpus was all ill got at the expense of privacy rights.
Yup, the more AI regurgitated-crap fills the internet, the worst the new AI regurgitating models will get. It's a vicious cycle of ever-worsening vomit. The internet was such a useful thing twenty years ago...
It’s not the case that adding synthetic data necessarily makes models worse. People outside the field love this premise, but it’s behavior that only shows up in experiments that make the most pessimistic assumptions (eg earlier data must be discarded at every generation)
Is Model Collapse Inevitable? Breaking the Curse of Recursion The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent...
There’s a whole universe of “worse” short of “collapse”
I appreciate your correctives, Ted. However, here the model not collapsing (with accumulation) is hardly good news, right? In fact, loss still happens, just not catastrophically (right side of figure). One can't seem to say from this paper that quality improves w accumulation but i've only glanced.
No, that’s right — but they’re making no effort to improve the model. When synthetic data is produced to improve performance, it’s designed and filtered. It’s not “just train a model on its own output.” The question being tested here is simply, “if that does happen, does it produce collapse?”
Ok, but two different scenarios are being suggested here. You're suggesting folks intentionally building & useing synthetic data to supplement a training set. The initial reply suggested one where synthetic and real text was being hoovered up in one big training set. Doesn't that make a difference?
The paper tests the second scenario, the pessimistic one, and shows that as long as some real data is retained you don’t get collapse. “Retained” could mean “hoovered” or “held over from 2023.”
The superintelligence has Kuru.
filling the web with ai garbage is part of the plan. the machines are going to make us dumber so it's easier to pass the turning test.
And Google is saying that you can ground your Vertex AI app in Google Search 🤪
It used to be hot garbage. Now it will be cold garbage.
The only way to train AI is to have human created content flagged so it can tell the difference which means we can also use that flag to ignore AI outputs.