JSTOR when Aaron Swartz values their 'corpus': boo! evil! one million years dungeon!
JSTOR when AI brotech values their 'corpus': hell yeah! this rules! Yeet yourself into this corpus, fam!
I would like JSTOR to clarify what it means with the following words, because they keep using them, and I don't think they know what they mean: collaboration, community, interactive, trusted, corpus, empower, deepen, and expand.
Well, an LLM is generative AI, and it looks like they're using one. This is a thing you will see being used in many places as we figure out how to do them better. I can explain further how they work and avoid hallucinations if you'd like.
Nobody wants you to explain this. People want you and the rest of these AI cultists to stop shoving it down our throats. Nobody wants this shit. Any of it.
The backend is gpt-3.5-turbo so... no. It only costs 1.7% of what GPT-4 does to run. You don't need to run the massive models for tasks like this which is a major upshot.
That's not a huge problem when dealing with a limited corpus. The LLM can piece together words which you can then tokenize and vectorize and then compare to the same info found in the limited dataset it is supposed to talk about. If the LLM goes off into la-la land have it try again.
That’s where I get lost; the part where it lies and you have to say “no you’re lying stop that and try again” which requires that you know when it is lying
If I already know when it’s lying, why did I need to ask in the first place?
Ah, that's part of the computer's job, not the user. FIguring out how close you are to "ground truth" is hard in a generic context but easier when dealing with a limited corpus.
It’s legal, even encouraged, to let the lie robot have all the documents, but if someone wants to let us have all the documents without putting them in the idiot blender it is bad