Unauthorized “David Attenborough” AI clone narrates developer’s life, goes viral

Key Takeaways:

– Developer Charlie Holtz combined GPT-4 Vision and ElevenLabs voice cloning technology to create an unauthorized AI version of David Attenborough narrating his every move on camera.
– The AI-generated narration imitates Attenborough’s style and comments on Holtz’s actions as if in a wildlife documentary.
– The process involves a Python script that takes photos from Holtz’s webcam and feeds them to GPT-4V, which generates text in Attenborough’s style. The text is then fed into an ElevenLabs AI voice profile trained on audio samples of Attenborough’s speech.
– Similar demonstrations have been done, such as using a cloned voice of Steve Jobs critiquing designs created in a design app.
– Voice cloning technology raises ethical and legal concerns, as it can create convincing deepfakes of a person’s voice. ElevenLabs prohibits cloning voices in a way that violates intellectual property and copyright, but enforcement can be challenging.
– While some people express discomfort with unauthorized use of Attenborough’s voice, others find the demo amusing.

Ars Technica:

Enlarge / Screen capture from a demo video of an AI-generated unauthorized David Attenborough voice narrating a developer’s video feed.

Charlie Holtz

On Wednesday, Replicate developer Charlie Holtz combined GPT-4 Vision (commonly called GPT-4V) and ElevenLabs voice cloning technology to create an unauthorized AI version of the famous naturalist David Attenborough narrating Holtz’s every move on camera. As of Thursday afternoon, the X post describing the stunt had garnered over 21,000 likes.

“Here we have a remarkable specimen of Homo sapiens distinguished by his silver circular spectacles and a mane of tousled curly locks,” the false Attenborough says in the demo as Holtz looks on with a grin. “He’s wearing what appears to be a blue fabric covering, which can only be assumed to be part of his mating display.”

“Look closely at the subtle arch of his eyebrow,” it continues, as if narrating a BBC wildlife documentary. “It’s as if he’s in the midst of an intricate ritual of curiosity or skepticism. The backdrop suggests a sheltered habitat, possibly a communal feeding area or watering hole.”

How does it work? Every five seconds, a Python script called “narrator” takes a photo from Holtz’s webcam and feeds it to GPT-4V—the version of OpenAI’s language model that can process image inputs—via an API, which has a special prompt to make it create text in the style of Attenborough’s narrations. Then it feeds that text into an ElevenLabs AI voice profile trained on audio samples of Attenborough’s speech. Holtz provided the code (called “narrator”) that pulls it all together on GitHub, and it requires API tokens for OpenAI and ElevenLabs that cost money to run.

While some of these capabilities have been available separately for some time, developers have recently begun to experiment with combining these capabilities together thanks to API availability, which can create surprising demonstrations like this one.

During the demo video, when Holtz holds up a cup and takes a drink, the fake Attenborough narrator says, “Ah, in its natural environment, we observe the sophisticated Homo sapiens engaging in the critical ritual of hydration. This male individual has selected a small cylindrical container, likely filled with life-sustaining H2O, and is tilting it expertly towards his intake orifice. Such grace, such poise.”

In a different demo posted on X by Pietro Schirano, you can hear the cloned voice of Steve Jobs critiquing designs created in Figma, a design app. Schirano used a similar technique, with an image being streamed to GPT-4V via API (which was prompted to reply in the style of Jobs), then fed into an ElevenLabs clone of Jobs’ voice.

We’ve previously covered voice cloning technology, which is fraught with ethical and legal concerns where the software creates convincing deepfakes of a person’s voice, making them “say” things the real person never said. This has legal implications regarding a celebrity’s publicity rights, and it has already been used to scam people by faking the voices of loved ones seeking money. ElevenLabs’ terms of service prohibit people from making clones of other people’s voices in a way that would violate “Intellectual Property Rights, publicity rights and Copyright,” but it’s a rule that can be difficult to enforce.

For now, while some people expressed deep discomfort from someone imitating Attenborough’s voice without permission, many others seem bemused by the demo. “Okay, I’m going to get David Attenborough to narrate videos of my baby learning how to eat broccoli,” quipped Jeremy Nguyen in an X reply.

Source link

AI Eclipse TLDR:

In a recent demonstration, developer Charlie Holtz combined OpenAI’s GPT-4 Vision (GPT-4V) and ElevenLabs voice cloning technology to create an unauthorized AI version of David Attenborough narrating Holtz’s actions on camera. The video, which gained over 21,000 likes, shows the fake Attenborough providing humorous commentary on Holtz’s appearance and behavior. To achieve this, a Python script called “narrator” captures a photo from Holtz’s webcam every five seconds and sends it to GPT-4V via an API. GPT-4V then generates text in Attenborough’s style, which is fed into an ElevenLabs AI voice profile trained on audio samples of Attenborough’s speech. Although similar capabilities have existed separately for some time, developers are now experimenting with combining them using available APIs. The demonstration highlights the ethical and legal concerns surrounding voice cloning technology, as it can create convincing deepfakes of a person’s voice. While some people expressed discomfort at the unauthorized use of Attenborough’s voice, others found the demo amusing and suggested using it for other purposes, such as narrating videos of babies learning to eat broccoli.