It took several tries, but I finally found a way to ask Copilot why it fucked up. It gave a pretty good answer – but an answer that implies that you just can't rely on an LLM to answer even basic questions.
There is no truth or falisity baked anywhere into the thing whatsoever. What it has learned is "grammar and naturalness of language". Which is, don't get me wrong, an enormous technical achievement. But the thing doesn't know anything at all, and especially with mathematical things, it really sucks
And that's worth getting at it, because math's grammar, especially, is very stripped down. It exists to sort out truth from falsity in a lot of ways, so "grammatically correct false things" are easy to pose in formal math. Exactly the task that chatgpt is bad at