Do LLMs "have" the "abillity" to be told they are wrong or incorrect and be able to contest that?

cheese_greater@lemmy.world · 2 days ago

Do LLMs "have" the "abillity" to be told they are wrong or incorrect and be able to contest that?

geneva_convenience@lemmy.ml · 3 hours ago

Yes, but you’ll mostly only activate this when you start asking about dangerous stuff like medical advice. They are glad to go along with any other opinion you might have.

VoodooMischief@lemmy.ca · 5 hours ago

A concept that I think is really helpful for interpreting what an LLM does is the concept of a “Chinese room”. The idea is someone slips a piece of paper containing a message in Chinese under the door and inside that room is someone that doesn’t know Chinese following a set of rules for converting characters and numerals into a response based off their syntax. Afterwards the person in the room creates a response and slips it back out under the door. At no point does the person in the room understand the Chinese in the input or in the output, but the person standing outside of the room might believe there is a Chinese speaker inside of the room. This is the same idea with computerized outputs like LLMs. They only provide the illusion of intentionality and don’t actually have an understanding of inputs or their outputs.

daniskarma@lemmy.dbzer0.com · 4 hours ago

They have the ability to say whatever they are trained to say. You just need a training set with the inputs and outputs you want.

whatiswrongwithyou@lemmy.ml · 3 hours ago

Yeah you gotta specifically prompt for that in great detail though.

The only consistent way I’ve gotten ai disagreement is as part of an agent or harness or with some moe models.

KnightOfOldEmpire@lemmy.ml · edit-2 14 hours ago

Not as such, as you need to find what’s the source of their information and if you do point out the error, they will usually answer with something like “I’m just a Database, I’m not responsible for what’s been put in me, what do you want me to do about it”. Maybe added continuation of “I can agree with the information being disputed, if you want me to do that”.

SkyNTP@lemmy.ml · 1 day ago

LLMs have no opinions. They are merely mathematical models that predict how an average person (mostly Internet people) might respond plus a little bit of invisible priming baked in to steer the behaviour a little (which is also easy to change yourself).

A lot of these chatbots, notoriously chatgpt, are heavily primed to pander to the user.

Jessicat@lemmy.world · 1 day ago

The pandering is really unnerving when you’re not used to it. I’ve used chatGPT only once and it was to edit my resume since I hate doing that. It helped with consistency and wanted active verbs in there which was fine. It was the constant compliments at the top of every response that was so weird. Like I’m asking for feedback, not to hear how wonderful of a writing job I already did. I felt like I was being handled like an oversensitive child.

DisasterTransport@startrek.website · edit-2 14 hours ago

That’s a great observation —the sycophantic nature of ChatGPT can be really offputting when you’re not used to the way the AI writes. Would you like:

tips and tricks on using ChatGPT?
advice on creating your own GPT prompt?
suggestions on less sycophantic AI chatbots?

I’m here to help. Just let me kn

can’t do it anymore the bit isnt worth it ew ew ew

Jessicat@lemmy.world · 7 hours ago

Thanks for the offer but I’m good for now. I find that web searches get me where I need to go. I find scientific papers easily that way.

CultLeader4Hire@lemmy.world · 1 day ago

I think a lot of people turning to these things essentially are over sensitive children in adult bodies

SpaceNoodle@lemmy.world · 2 days ago

Give it a shot. They’re just sycophantic stochastic parrots anyway.

tourist@lemmy.world · 1 day ago

damn, you’re right

immediately folded and apologised on the first message

I don’t know why it needed to “think” about that for three seconds

wewbull@feddit.uk · 1 day ago

“Thinking” means it talked to itself for a while. I expect you can expand the “thinking” by clicking on it.

gravitas_deficiency@sh.itjust.works · 1 day ago

Now ask it why it’s apologizing

KnightOfOldEmpire@lemmy.ml · 15 hours ago

This is the secret question as it often boils down to “I didn’t think it’d do harm so I just did what you thought you wanted me to do”

affenlehrer@feddit.org · edit-2 2 days ago

It depends on training, prompting and reasoning capabilities.

Sometimes they’re prompted to not be assertive. You can often tell them though to e.g. “behave like an XYZ expert in a bad mood who doesn’t accept nonsense”.

I’ve had e.g. ChatGPT contest me a lot even though I was right. It was about a bicycle brake design I had never seen before. It gave me some options of what it could be and helped me to actually find out what it was. However after I did some research and found out what the actual type was it kept doubting my result and insisted it was a different kind of brake.

gravitas_deficiency@sh.itjust.works · 1 day ago

The best sort of methodology I’ve found to coerce Claude or whatever (we are strongly encouraged to use it, because you know, tech these days) is (for a single agent) to define a process that includes proving its work and citing sources. For agentic flow, you basically just assign a contrarian role in particular domains to some of the agents - ideally all of this is also hooked into an MCP server that includes deterministic utilities to improve accuracy and solution arrival speed.

It’s basically just a shitty, brute-forced, massively over complicated Monte Carlo algorithm that’s wildly inefficient in terms of energy usage and infrastructural cost, that also happens to be turning our economy into a highly flammable house of cards.

Can you tell what my opinion of all this bullshit is, despite knowing how to do all of this crap reasonably well? 😛

sloppy_diffuser@sh.itjust.works · 6 hours ago

strongly encouraged for me means my entire bonus is tied to skilling up with LLMs.

“cite-or-stop” with journaling, role definitions, subagents to control context windows, and shifting as much of the work as possible to deterministic scripts has yieled pretty high-quality results.

Expensive af though. Like the self-review skill I maintain before engineers put their code up for human review will find 30-40 things on average, where only a small handful are false positives, can burn 10% of a $20/mo plan’s utilization in a single run on a moderately sized PR. The default one my company setup in our source control usually finds 1-5 things and only 0-2 are of any value.

gravitas_deficiency@sh.itjust.works · 6 hours ago

Oh yeah it 100% explodes your token usage by establishing a contentious paradigm. But that’s what the C-suites want, and they don’t understand (or care to understand) the subtleties of how this development methodology works. So we’ll all (somewhat maliciously) follow orders until they change their minds when they see the absolutely ludicrous LLM bill 🤪

affenlehrer@feddit.org · 1 day ago

I think that’s a good approach. Personally I find LLMs quite fascinating but they’re deeply flawed. They can barely be used in production environments, especially unsupervised. The workflows regarding LLMs are very esoteric with specific prompting techniques etc and while all LLMs have similar flaws each model and model version behaves differently. It’s super weird and unreliable. Like one big workaround that has so much investment that it keeps improving every month but still stays shitty at it’s base.

yermaw@sh.itjust.works · 1 day ago

I’m trying not to judge and to just remain curious here. Why would you keep using AI like that?

KnightOfOldEmpire@lemmy.ml · 14 hours ago

It does sometimes provide a procedure in troubleshooting. As internet gets clogged under bloat, and you haven’t done anything like that before, it’s not the worst idea to give you starting points to think about. It’s best used with combination of other sources.

Deepseek was useful in explaining me in finding out where can I find information for car rims (CoC it was my first time looking for such thing) and how to get the correct nuts… it was honestly more useful than a human agent from one of the shops… who wanted all the documentation I had on the car before even hearing the question, which felt a tad excessive, considering their field of work.

I asked Rufus on amazon DE, if this hub requires a purchase of separate supply to work. It answered no, ofc it did.

With the push towards AI and no one seemingly willing to answer questions or even explaining real alternatives in person, I’m only left with a question what is a person supposed to do? Re-buy or relay on services that either overcharge or they send you to their competitor because they deemed the problem of too little value to solve?

affenlehrer@feddit.org · edit-2 1 day ago

You can use LLMs for things that where not possible or very difficult with traditional search engines.

E.g. I wanted to know what kind of brake my daughters bike used. Traditionally I’d either have to research all possible brake types and compare then with her bike or take a photo and post it to a forum or reddit or something and hope someone knows the answer.

With ChatGPT (I only use the free version) I just took a photo and asked what kind of break it was and got a (actually good) list of 2 possible brakes. It was one of the two.

Very convenient. However, I’m aware how LLMs work and what their limitations are. Of course also the environmental issues.

BTW: It was a band brake.

howrar@lemmy.ca · 1 day ago

It led them to the right answer. That’s positive reinforcement.

CommanderCloon@lemmy.ml · 1 day ago

Because search engines have been dogshit for the past 2 years (give or take~); AI need to be steered hard but as long as you’re asking for sources (and don’t just read what they’re saying) you can get an answer out of them.

I only find what I’m searching for on Google if I already know the exact keywords for what I’m specifically looking for (…and even then…); if I don’t know the exact terms of what I’m looking for (like @affenlehrer@feddit.org with his bike brake issue) then google is useless nowadays, which wasn’t always true. So now my process is to google first, ask an AI second, and I end up using AIs way more than I would like.

MNByChoice@midwest.social · 1 day ago

Yes. I have had either Gemini or ChatGPT tell me I was wrong. There is a video of a person timing a ~5 second run with ChatGPT and it insisted their run was 10 minutes.

Is this ability useful? Debatable.

dosse91@lemmy.trippy.pizza · 1 day ago

Yes, but they usually have a system prompt telling them to be super polite.

I’ve been using Qwen3.6 a lot lately to debug a shitty old game that doesn’t want to work in wine and during the thinking process it said more than once things like “The user is wrong, […]”, but at the end it always said something polite like “You are right to say that X, but it’s more correct to say that Y, …”.

Shellofbiomatter@lemmus.org · 2 days ago

It would have just agreed and reinforced you and claimed you’re the greatest to figure it out. Current LLMs just say whatever you want to hear, those have no understanding of what is right or wrong information.

reparæi [he/they]@lemmy.ml · 1 day ago

Completely depends on context for the most part, approach them not as sovereign beings which follow their own perspectives and opinions but as advanced auto-complete, to test this merely ask it a question using niche terminology, identical question depending on the terminology you use will respond biased towards that opinion and perspective, you can get an LLM to admit to anything, to go along with every single thing you said like a blind yes-man or blindly object to every single thing you say even the most basic of facts merely off of how the context your chat has mapped most cloesly onto which parts of its training data for predicting text.