At a guess, when people ask it to “sum the numbers above”, they usually test it on the sequence 1,2,3,4,5. It’s an LLM, it’s doesn’t process its input, it returns one of the most probable tokens based on what it’s seen before. If it actually becomes a “thing”, crashing the global economy is the least of our worries.
Microsoft says its Agent Mode in Excel has an accuracy rate of 57.2 percent in SpreadsheetBench, a benchmark for evaluating an AI model’s ability to edit real world spreadsheets.
There’s a “it’s in the product” thing, and a “people actually, seriously, use it fir actual work” thing. We’ve got the first, I’m hoping that enough people get burnt by it being wrong, in non-serious ways, that noone tries to use it seriously. Hope and expectation are different things though. sigh
At a guess, when people ask it to “sum the numbers above”, they usually test it on the sequence 1,2,3,4,5. It’s an LLM, it’s doesn’t process its input, it returns one of the most probable tokens based on what it’s seen before. If it actually becomes a “thing”, crashing the global economy is the least of our worries.
It is actually already a thing: https://support.microsoft.com/en-us/office/copilot-function-5849821b-755d-4030-a38b-9e20be0cbf62
Also, see this article from last week: https://www.theverge.com/news/787076/microsoft-office-agent-mode-office-agent-anthropic-models
Hmm… -.-
There’s a “it’s in the product” thing, and a “people actually, seriously, use it fir actual work” thing. We’ve got the first, I’m hoping that enough people get burnt by it being wrong, in non-serious ways, that noone tries to use it seriously. Hope and expectation are different things though. sigh