Google’s best Gemini AI demo video was fabricated

Key Takeaways:

– Google released a promotional video for its new AI model, Gemini, which appeared to show the model recognizing visual cues and interacting vocally with a person in real time.
– However, Google admitted that the video was misleading and that researchers fed still images to the model and edited together successful responses to misrepresent its capabilities.
– AI experts pointed out that running still images and text through large language models like Gemini is computationally intensive, making real-time video interpretation impractical.
– The video also did not specify that the recognition demo likely used Gemini Ultra, which is not yet available, leading to questions about the marketing efforts of Google.
– While Gemini’s image recognition abilities are impressive on their own, the video edited together the capabilities to make the model seem more capable than it is, generating hype.
– The controversy surrounding the video has caused AI experts to question the credibility of Google’s claims and has led to a backlash against the company’s deceptive marketing tactics.

Ars Technica:

Enlarge / A still from Google’s misleading Gemini AI promotional video, released Wednesday.

Google

Google is facing controversy among AI experts for a deceptive Gemini promotional video released Wednesday that appears to show its new AI model recognizing visual cues and interacting vocally with a person in real time. As reported by Parmy Olson for Bloomberg, Google has admitted that was not the case. Instead, the researchers fed still images to the model and edited together successful responses, partially misrepresenting the model’s capabilities.

“We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges,” a spokesperson said. “Then we prompted Gemini using still image frames from the footage, & prompting via text,” a Google spokesperson told Olson. As Olson points out, Google filmed a pair of human hands doing activities, then showed still images to Gemini Ultra, one by one. Google researchers interacted with the model through text, not voice, then picked the best interactions and edited them together with voice synthesis to make the video.

Right now, running still images and text through massive large language models is computationally intensive, which makes real-time video interpretation largely impractical. That was one of the clues that first led AI experts to believe the video was misleading.

The Google Gemini video in question.

“Google’s video made it look like you could show different things to Gemini Ultra in real time and talk to it. You can’t,” wrote Parmy Olson in a tweet. A Google spokesperson said that “the user’s voiceover is all real excerpts from the actual prompts used to produce the Gemini output that follows.”

Playing catch-up with hype

Over the past year, upstart OpenAI has embarrassed Google by pulling ahead in generative AI technology, some of which traces its origins to Google research lab breakthroughs. The search giant has been scrambling to catch up since early this year, putting great effort into ChatGPT competitor Bard and large language models like PaLM 2. Google framed Gemini as the first true rival to OpenAI’s GPT-4, which is still widely seen as the market leader in large language models.

At first, it seemed like everything was going to plan. After announcing Google Gemini on Wednesday, the company’s stock was up 5 percent. But soon, AI experts began picking apart Google’s perhaps overhyped claims of “sophisticated reasoning capabilities,” including benchmarks that might not mean much, eventually focusing on the Gemini promotional video with fudged results.

In the contested video, titled “Hands-on with Gemini: Interacting with multimodal AI,” we see a view of what the AI model apparently sees, accompanied by the AI model’s responses on the right side of the screen. The researcher draws squiggly lines and ducks and asks Gemini what it can see. The viewer hears a voice, apparently of Gemini Ultra, responding to the questions.

As Olson points out in her Bloomberg piece, the video also does not specify that the recognition demo likely uses Gemini Ultra, which is not yet available. “Fudging such details points to the broader marketing effort here: Google wants us remember that it’s got one of the largest teams of AI researchers in the world and access to more data than anyone else,” Olson wrote.

Taken alone, and if represented more accurately (as they are on this Google blog page), Gemini’s image recognition abilities are nothing to sneeze at. They seem roughly on par with the capabilities of OpenAI’s multimodal GPT-4V (GPT-4 with vision) AI model, which can also recognize the content of still images. But when edited together seamlessly for promotional purposes, it made Google’s Gemini model seem more capable than it is, and that had many people hyped up.

“I can’t stop thinking about the implications of this demo,” tweeted TED organizer Chris Anderson on Thursday. “Surely it’s not crazy to think that sometime next year, a fledgling Gemini 2.0 could attend a board meeting, read the briefing docs, look at the slides, listen to every one’s words, and make intelligent contributions to the issues debated? Now tell me. Wouldn’t that count as AGI?”

That demo was incredibly edited to suggest that Gemini is far more capable than it is,” replied pioneering software engineer Grady Booch. “You’ve been deceived, Chris. And shame on them for so doing.”

Source link

AI Eclipse TLDR:

Google is facing criticism from AI experts for a misleading promotional video that suggests its new AI model, Gemini, can recognize visual cues and interact vocally with a person in real time. However, Google has admitted that the video was edited together using still images and text prompts, partially misrepresenting the model’s capabilities. The video led many to believe that Gemini was more advanced than it actually is. The controversy comes as Google tries to catch up to rival OpenAI in generative AI technology. While Gemini’s image recognition abilities are notable, they are not as advanced as portrayed in the video. AI experts have called out Google for the deceptive marketing tactics used in the video.