In the past few years, we've seen a lot of Deepfake videos of celebrities like Donald Trump and Barack Obama on the internet and in the news. These videos have made people curious and started lots of debates.
But how hard or easy is it to make a video like this yourself? This article will give a simple overview of the different tools and services that you can use to make Deepfake videos.
Let's learn more about Deepfakes and what it takes to make these amazing, yet sometimes creepy, videos.
In this experiment, a fictional person is to be created based on a photo and a voice recording. Various AI tools will be used for this purpose, which enable a photo to be altered and a voice to be generated. The individual components will then be assembled into a video that shows and allows the fictional person to speak.
Those searching online in this area typically find providers that create a Deepfake video from existing videos and face photos using Face-Swap. However, for this experiment, I want to use a provider who can create a so-called Talking-Head video from a selfie. Face-Swap would be an option if a video is already available. Moreover, Face-Swap / Deepfake websites often have a dubious nature, advertise with celebrity photos or originate from the pornographic sector, and are
As a template, a photo and voice recordings of me will serve, which will then be cloned or altered with the help of AI algorithms. I would like to emphasize that all components should be individually crafted. So no stock elements such as digital avatars or pre-made voices are used.
It is particularly important that the process is simple and quick. I want to show a method that allows efficient work and still get a result with good quality. It is not my goal to make a perfect Deepfake video, for that, more time, different tools, and other initial data would be needed. Here and now, it's about the quick and easy solution.
Of course, there are certain costs involved, but they should remain within a reasonable framework and not develop into exorbitant sums.
One possibility would be to have the photo completely generated by the AI. There are hundreds of providers for this, either for free or for a small fee, that generate images of people from text prompts. The best known are probably Midjourney, Stable Diffusion or DALL·E by OpenAI. You can find many more on Know-Your.AI in the category: IMAGE.
Photo apps with AI filters have been known for years and everyone has probably used Face-Swap in Snapchat before. There are already many small and large providers that specialize in this technology and offer it as an API, SaaS, or app.
For our fake images from which we will later generate a video, I have tried out different providers. All have more or less the same features and can generate an image with a different face based on a selfie. Either younger, older, different hairstyle, hair color, or facial features like eyes, nose, mouth, etc. can be altered.
I have selected three providers as examples that cover the typical functions of photo AI apps.
Fotor is actually an online image editor. But it also offers some AI functions. These include a Text-to-Image generator or AI Photo Effects. With the AI Face Generator, you can have faces completely generated by AI or create variations of faces using the Image-to-Image function.
Game-Character-Variant of my selfie.
Generating images costs "credits", and if you register, you get some of them for free. If you need more, you can subscribe and generate 200 images for about 3 Euros per month.
Die Time Machine is a feature of MyHeritage that allows users to bring their ancestors to life in a personalized way. After uploading 10 - 25 of your own selfies, the Time Machine generates many different image variants from different epochs.
MyHeritage offers other AI features such as DeepStory and Animated Family Photos which are optimized for transforming old family photos into videos. However, for this experiment, I decided to go with a provider that offers more options for generating the video.
The FaceLab App is an impressive application available on iOS devices. It provides users with the opportunity to alter and edit their faces in a playful and creative way. With a variety of filters, effects, and tools, users can adjust and experiment with their appearance in real time.
The simple solution would be one of the many text-to-speech services that convert any text into speech and then offer it for download as an MP3 file. But that's not what we want. We want a voice that is not pre-made, but that we have created ourselves. For this, there are also providers who specialize in voice cloning.
Voice cloning refers to the generation of a voice based on an existing voice. For this, a text is read in and then a voice is generated with the help of AI algorithms, which is very similar to the original. The quality varies greatly and depends on many factors. The length of the text, the quality of the recording, the language, etc.
For this experiment, I chose ElevenLabs. For the cloning, I read three minutes of "The Metamorphosis" by Franz Kafka in German and uploaded it as an MP3 file. After a few seconds, the voice was generated and could be selected in the interface.
Afterwards, you can create and download any spoken text in the ElevenLabs interface. The quality is really impressive and the voice sounds relatively similar to mine. What's super cool is that I can suddenly speak Spanish, French and Italian, and even with tough tongue twisters, I make no mistakes.
Example: Different Languages
Tongue Twister Example
ElevenLabs offers a free trial version with 10,000 characters/mo of speech synthesis and 3 custom voices. After that, you have to pay for the service. The price starts with $5/mo for 30,000 characters and 30 custom voices.
The video is a so-called Talking-Head video. In such videos, you see a real person speaking a text directly into the camera. Especially for the creation of such videos, there are specialized web services that either offer pre-made avatars or the possibility to create such an avatar from a picture.
The pre-made avatars are usually of higher quality and more realistic in appearance. However, we want to create a video of ourselves. Therefore, I chose a provider who creates an avatar from a photo.
HeyGen has a simple interface. You can not only create avatars from photos or use pre-made avatars. HeyGen also offers an online video editor with which you can create different scenes in various templates. Texts can simply be spoken by AI voices, or you can use your own voice files.
With just a few clicks, the photo is uploaded and integrated as an avatar into the video. Then you can add graphic elements, texts, and images and create multiple scenes. At the end, the audio tracks are uploaded and assigned to the respective scenes. HeyGen then combines the audio and the avatar image into a Talking-Head video. Simple videos with few scenes and elements can be created in just a few minutes.
HeyGen Video Editor
HeyGen Result Page
HeyGen also have a voice cloning feature, but that's a seperated product you have to buy and it costs $99/year. For video generation HeyGen offers a free tier with one minute of video generation and the cheepest product starts at 30$/month for 10 minutes of video generation.
For this price and the minimal time investment, the results are surprisingly good. Upon a second look, of course, one recognizes that it's not real humans speaking, which, however, might not be relevant for many use cases. For smaller side-projects, private use, and semi-professional applications, the quality of these Talking-Head videos is certainly sufficient.
Example: Game character and pre-made voice
Example: Old character and cloned voice
Alternatives to HeyGen with similar functionality include synthesia, Colossyan and Hour One. However, neither of them has a function comparable to HeyGen's "Talking Picture". Although one can have a digital avatar created, this might be too expensive for some customers. The price for this at Synthesia is 1000 USD and in this YouTube video you can see why.
Competitor Elai offers a so-called Selfie Avatar for $259 and a "Studio Avatar" for $500. Compared to the $30 from HeyGen, that's obviously a hefty price.
Another alternative is tokkingheads, a site more intended for creating memes, which offers a free version. You can choose from many pre-made assets or upload your own images, videos, and audio files. The creation is very fast and simple, but the quality is significantly lower than with other providers.