Post

Deleted.
Avatar
exactly. "Jailbreaking" a phone is breaking the security boundaries so you can run stuff on the device that you, the user, want, rather than authorized by the manufacturer. This isn't "jailbreaking". It's getting a stochastic speech engine to say things it's not supposed to
Avatar
it's not even really "breaking" anything. It's the designers don't have a strong sense of how to constrain the engine at all, and it's not really clear what the constraints are, or even if the engine is constrain/able/. Which is just a very different thing.
Avatar
Saying it's a skeleton key or a jailbreak is sort of arguing in the wrong domain, giving users the sense that this is a minor defect that can be corrected, rather than a more fundamental problem that nobody really knows how to put this blob of jello on rails and keep it from sliding off
Avatar
It’s also really funny when considering that these “locks” or “secure containers” are also made up of the exact same blobs of jello, being themselves prompts layered on to the model after the fact.
Avatar
Right. I'd have a bit more respect for attempts to constrain and circumvent protections here if there was a bit more theory underpinning /why/ these protections should work in the first place, rather than just an empirical "they just do sometimes and don't other times and we're not really sure why"
Avatar
“It’s non-deterministic and we don’t understand why.”
Avatar
That’s why I was kind of shocked at meta talking about the us of ai in compiler output optimisation. And then it turned out they decompile things to check and exactly what work are you saving?
Avatar
also ... compilers is almost /the/ canonical task where output correctness is non-negotiable.
Avatar
"this runs 100 micro-ops faster on these inputs but now requires you to spend 4 days reverse-engineering why it crashes sometimes. Also the builds are non-reproducible" statements by the utterly deranged
Avatar
Avatar
If you did want to use a human metaphor, the “jail” is as if you convinced someone to believe that they couldn’t act in a certain way, and the “break” is simply telling them that yes, you can act that way. There’s no systemic-level controls happening, it’s all occurring in the “mind” of the model
Avatar
Or to continue with the jello metaphor, it’s like shouting at a pile of jello to retain the shape of a sphere and then being surprised when it doesn’t
Avatar
I don't know the scientific explanation but I told the AI I was good and it believed me
Avatar
*sitting on the sidelines hoping this metaphor takes a life of its own*
Avatar
Are we talking about LLM (A.I.)? 'Cuz if we are, here's my 2¢: Any value or usefulness that might come from A.I. is almost all DERIVED from something else. Somebody else's research, novel, designs, knowledge, skills. Derivatives are almost always bad. Anyone who lived through '08 would know that.
Avatar
Basically, they need to teach their AIs morals. We don't even know how to reliably teach that to humans.
Avatar