Katie Mack: I don’t think it can be emphasized enough that large language models were never intended to do math or know facts; literally all they do is attempt to sound like the text they’re given, which may or may not include math or facts. They don’t do logic or fact checking — they’re just not built for that

I don’t think it can be emphasized enough that large language models were never intended to do math or know facts; literally all they do is attempt to sound like the text they’re given, which may or may not include math or facts. They don’t do logic or fact checking — they’re just not built for that

Unfortunately, it repeatedly offers incorrect answers for straightforward math questions. 🧪

A version of OpenAI's GPT-4 will be 'teaching' thousands of kids this fallwww.popsci.com Khanmigo's AI beta "test" program is meant to assist teachers with individualized student help.

Someone once said that replies from AI are "what answers may sound like" and I never hears more accurate description.

Artificial Intelligence 🤝 Artificial Answers

Did you ever see Stross's masto thread about "Fun facts about Charles Stross"? Thing started out strong, aside from calling his Scottish secessionist ass a _proud Brit_, but it didn't know how to tell him "no" when he repeatedly asked for more. It must give answers even if none exist.

I've seen code it generates. Now that is a hallucination if I ever saw one.

Early on, work actively exhorted us to use it for code. TBH I have never tried, because every time I started, it felt like i was spending more time describing it than I would have spent writing it.

I use GitHub Copilot as an extension of intellisence, it can be pretty good at completing lines of code, or commenting, but in the chat mode, when I asked it to write me some code from scratch - it invented non-existing classes and methods, and in general produced garbage.

If not answer then why answer shaped?

that's literally what they're designed to do lol it just tries to predict what the most likely response would be

It's pretty much autocorrect on a larger scale, right? We all know how accurate autocorrect can be...

Yep, same general idea

Autocorrect once changed 'soggy' to 'doggie style' in a text to my mom so I'm not getting behind autocorrect's big brother just yet.

I’m truly hoping your mother didn’t respond with “NOICE” Although if one of my kids had the same issue, that’s exactly how I would have responded.

Lol “not getting behind it”

As a colleague noted: Steroid-enhanced autocorrect with mentalist tendencies….

The mentalist tendencies, sadly, are more in the audience's perception than the machine.

Autocorrupt never ever took the Department Org Chart, "corrected" the spellings of all the names in ways that looked like mockery, and changed the title to Department Orgy Chart, without Microsoft Mail asking if I wanted to do that to the attachment. Fortunately I caught it before it got read.

Good Lord

Some years later I had Microsoft as a customer, so I quickly had to train myself out of saying "I *hate* Microsoft!" as many times a day as I previously had been :-) At least the "recall previously sent mail" worked, and nobody actually reads attachments, after one coworker let me know.

Yes. It’s not much different, fundamentally, from text generators trained on large texts, only at a staggeringly greater scale. Quantity has a quality all its own of course so they are remarkably better than autocorrect. But I don’t see the revolution anywhere.

Reminds me of this interesting blog post from last year, comparing autocorrect, grammar checkers and LLMs https://shkspr.mobi/blog/2022/11/is-it-cheating-to-use-spell-check/

Is it cheating to use spell check?shkspr.mobi When I was a kid, our school had one computer per classroom. Luxury! Teachers had long-since given up on the state of my handwriting. So I got special dispensation to write up some of my work on whate...

It cannot be stressed enough: a LLM accepts a sequence of words and returns a second sequence that is statistically correlated with the first. Any factual truth is purely incidental.

Truth is incidental enough in some domains (such as IT) to be useful, but it's not reliable in any sense. Ground truth is a problem that AI people have struggled with fire decades and while LLMs give the illusion of having solved this, they really really haven't.

Fully agree with you of course: "incidental" does not mean that it cannot be useful, as long as caveats and limitations are well understood.

I think you get people with a vested interest in eliding that fact, a lot of people who really want to believe LLMs are something they're not, and then a much larger third group who get really confused because of the first two groups and use chatgpt for restaurant reviews or chess or math questions.

...or to hallucinate citations to submit in a court filing, which I find the most entertaining use case to date :)

It doesn’t even work directly with the text. It tokenizes all the data it uses for that statistical correlation. This is why I’m so confounded by claims that an LLM “understands” its interactions with people.

Now, that is a very interesting point. I wonder if it could technically be argued that a human brain tokenizes data in a "similar" way when storing and processing information, but I am not a neuroscientist and therefore I have absolutely no clue.

That’s possible, I suppose, but even if so that tokenization would necessarily be only a tiny fraction of the processing that enables understanding. LLMs, as statistical probability engines, are impressive, but there’s literally nothing there to enable them to actually contextualize information.

I feel like a big part of the problem is the way everyone insists on calling it AI, when it’s… really not that.

I just...really appreciate y'all saying this because I feel like I'm in the upside down listening to people call something AI that just, poorly aggregates/synthesizes *texts* we're feeding into it.

It’s a victory for marketing and the desperate hunt for the next profitable thing.

I know what you mean. It’s mind boggling.

Seriously! It’s a glorified chat bot; it doesn’t “know” anything.

It gets even more confusing because LLM research gets presented at AI conferences, so in that sense they can be classified as an AI topic.

AI research has a history of classifying problems as "AI" and then when a (partially) successful technique shows up, the problem gets moved out of "AI". Optimizing compilers for example, or even just feedback based control systems. To me, all of machine learning looks like applied statistics.

It is an ai topic. It is ai. Pick up an ai book like russel norvig and see how many topics fall under that umbrella and have for a long long time

I know that, but the meaning of the term AI is very much context-dependent, which is what we're seeing here. Heck, even the industry incorrectly equates it to Machine Learning. That's what makes discussions like these confusing.

On a side-note, do you really think I haven't read the most common undergraduate textbook on AI when I do research in that field?

I mean it is a subset of AI and quite a large one

Fancy autocorrect

It's an easy way of explaining it to people who don't understand what machine learning and LLMs are, but the consequence of that as always is that it doesn't actually explain what machine learning and LLMs are. I find the Bing model sadly ironic because it's the worst possible use case for LLMs

Everyone knows what autocomplete is. It's like a fancy autocomplete that gradually deteriorates over time.

Honestly, it’s a deeply misleading way of explaining it. In a world where we have decades of science fiction depicting actual artificial intelligence, most people will assume the wrong thing. And the people who build these systems make scant effort to dissuade those assumptions.

Admittedly, I feel slightly hypocritical critiquing it because I've colloquially referred to it as AI before but you're absolutely right. I imagine OpenAI are very happy for people to believe these models are something greater than they are, given they're already struggling to sell the product

This has been true of “automation” for decades

LLM!

We should just call them algorithms. That’s all they are.

I asked it to write an R script for a particularly onerous problem, and it produced a formula that it took me several minutes to realize was profoundly wrong. The confidence and speed with which it produces wrong answers can only be replicated by a vast team of monkeys on typewriters.

Post