

These things are interesting for two reasons (to me).
The first is that it seems utterly unsurprising that these inconsistencies exist. These are language models. People seem to fall easily into the trap in believing them to have any kind of “programming” on logic.
The second is just how unscientific NN or ML is. This is why it’s hard to study ML as a science. The original paper referenced doesn’t really explain the issue or explain how to fix it because there’s not much you can do to explain ML(see their second paragraph in the discussion). It’s not like the derivation of a formula where you point to one component of the formula as say “this is where you go wrong”.
It was a bit unclear to me how stable this was to adjusting the course. Did they set up the course in a blind fashion?
With a lot of ML it boils down to how well the training set represents the situation.