Newslurp

<< Stories

Stumping the AI

Bloomberg Technology <noreply@news.bloomberg.com>

August 16, 11:06 am

Tech Daily
Hi, it’s Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crossword, Wordle and the infuriating Connections puz

Are you a New York Times word game enthusiast? The Mini Crossword, Wordle and the infuriating Connections puzzle have drawn many fans, and some of them tested how AI might fare. But first...

Three things you need to know today:

• Autodesk kept going with a risky sales practice after promising to stop
• China’s AMEC is suing the Pentagon as it tries to void US sanctions
• A CIA fund backed Yale scientists developing quantum error correction 

AIn’t so smart

Like many people on the internet, I have a love-hate relationship with Connections. For the uninitiated, the game takes place on a 4x4 virtual grid with sixteen words placed on it. The player’s job is to group those into sets of four, with each grouping becoming progressively harder. An easy set could include synonyms for conformists — followers, lemmings, puppets, and sheep — while a more challenging selection might be the phonological names of cities — such as deli, niece, roam and soul. If this metropolis grouping strikes you as bizarre, you aren’t alone. 

The game has become notorious for its brain-bending test of abstract reasoning skills. Players have made a pastime of dunking on Connections, writing on the social media site X that the game “chose violence today,” “deserves jail time” and is conditioning people to “find patterns that aren’t there.”

But take solace that artificial intelligence bots aren’t faring much better than us. They can only solve the entire game 8% of the time. 

We know this because a group of students in a computer science class at Barnard College decided to test the Connections skills of chatbots. They asked the latest models from OpenAI, Alphabet Inc.’s Google, Anthropic and Meta Platforms Inc. to solve 200 games, and found their performance was worse than human novices and much worse than human experts.

It soon dawned on the students that their project wasn’t just nerdy fun. They had stumbled upon a sophisticated way to test chatbots’ reasoning abilities, which is precisely what researchers are trying to measure and companies are trying to improve.

At a recent all-hands OpenAI meeting, leadership told employees that the startup was on the cusp of its systems becoming “reasoners” — meaning they can do basic problem solving. Executives showed a demonstration of how OpenAI’s most advanced systems can answer word problems that have stumped models in the past. 

While it isn’t clear if Connections was one of those word problems, the research by the Barnard students — who developed the class project with their professor into an academic paper — establishes this viral internet game as a valuable and challenging benchmark for AI’s reasoning abilities. 

Connections is designed to test different types of knowledge —  encyclopedic, semantic, associative and linguistic. For the 200 games, the researchers classified the types of knowledge required to solve each category so they could test how well AI can solve different types of problems.

They found that while AI is good at solving some problems involving semantic knowledge, other categories are much more challenging. For example, AI can easily group together followers, lemmings, puppets and sheep, because they share the same broad semantic meaning. However, it found associative categories harder, such as basketball, carrot, goldfish and pumpkin — things that are orange — and got stumped by categories that combine knowledge types like deli, niece, roam and soul, which requires linguistic and encyclopedic knowledge. 

“When it is required to think outside the box, or do any kind of divergent thinking, it struggles a lot,” said research scientist Tuhin Chakrabarty, who was a teaching assistant for the Barnard class and a co-author of the paper. The team’s findings can be used by researchers to improve specific kinds of abstract reasoning in their models, he added. 

The game designers of Connections intentionally place “red herrings” or distractors on the grid to confuse players. AI often falls into the trap of these red herrings, because it leaps into solving the game step by step without considering the big picture. 

“It's not good at viewing the whole puzzle as a problem in itself, which is one of the biggest shortcomings,” said Mariam Mustafa, one of the Barnard students and a co-author of the paper. 

If a grid has Monday, Tuesday, Wednesday and Thursday, the AI will likely group them together without considering that the grid also contains Morticia, Gomez and Pugsley, all Addams family characters that could be grouped with Wednesday (the daughter in the family).

Because AI is trained to produce the most likely next word, “it will say the thing that is most obvious without exploring all 16 words,” said Chakrabarty. “It is abstract reasoning in the presence of distractors – that is super hard for humans, and for LLMs it’s even harder.”

While AI companies continue working to improve their models’ reasoning skills, the takeaway for the researchers in the current moment is clear: Even after ingesting all of this data, AI still can’t solve the puzzle that everyone loves to hate.

The big story

Google now displays convenient AI-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many site owners say they can’t afford to block Google’s AI from summarizing their content, because blocking the AI would also hamper a site’s ability to be discovered online.

Get fully charged

Chinese tech stocks rose after JD.com beat expectations and Alibaba held steady against a stubbornly reluctant consumer demand.

BetMGM betting is coming to Brazil in early 2025, if a joint venture receives a license from the government this fall.

Starlink rival AST jumped more than 50% to close at a record after confirming an early September window for inaugural commercial launch.

More from Bloomberg

Get Bloomberg Tech weeklies in your inbox:

  • Cyber Bulletin for coverage of the shadow world of hackers and cyber-espionage
  • Game On for reporting on the video game business
  • Power On for Apple scoops, consumer tech news and more
  • Screentime for a front-row seat to the collision of Hollywood and Silicon Valley
  • Soundbite for reporting on podcasting, the music industry and audio trends
  • Q&AI for answers to all your questions about AI

Stay updated by saving our new email address

Our email address is changing, which means you’ll be receiving this newsletter from noreply@news.bloomberg.com. Here’s how to update your contacts to ensure you continue receiving it:

  • Gmail: Open an email from Bloomberg, click the three dots in the top right corner, select “Mark as important.”
  • Outlook: Right-click on Bloomberg’s email address and select “Add to Outlook Contacts.”
  • Apple Mail: Open the email, click on Bloomberg’s email address, and select “Add to Contacts” or “Add to VIPs.”
  • Yahoo Mail: Open an email from Bloomberg, hover over the email address, click “Add to Contacts.”
https://links.message.bloomberg.com/e/encryptedUnsubscribe?_r=f574328d4d0c4c359b90d8e49b10e21d&_s=6b786dcc2c9d4a4da322460445b68dd7&_t=uqm2OBrvX3suIBzxD2l-kcihzHuk-M7Ppovl-JGQnS7UbvkEeV4SXn3VtAoD65fz22NwooQeHBio6hjqqqQoDsafKOwUOWo69ImK0O8hswQ9KZZd4TcwpVWJ3YmDld60kVn1hLqEgcr978XvXF8Agg%3D%3D