Anyone Can Now Sing Anything Now!

I have made critical errors in judgement which have resulted in AI models being generated for people who shouldn’t have AI models formed after them. The streamer/podcaster in question: PayMoneyWubby. Enter at your own risk! (Safe for work lol)

If you haven’t read the post An Intro to AI, please do so now – it’s short, has videos, and will give you a basic understanding of what we’re going to go deeper into here today 🙂

It started with me watching one of my favorite streamers, PayMoneyWubby. If you’re unfamiliar with who Wubby is, he started as a small time video game streamer, branched into YouTube investigative journalism, and then moved to being a top 30 Twitch.tv partner and earning over $50,000,000 a year. Here’s one of his more famous videos on YouTube:

WARNING: Video content is not moderated – YouTube safe != School Safe

During a more recent stream, a random user donated some money for Wubby to play a song for the entire stream, and it was this song:

This should be safe for all ages – it was on the radio and in TV commercials 🙂

He started to poorly sing along with it, as is his shtick, and as he was doing this, I was already building the repo and getting ready to train an AI to sing. Using the so-vits-svc-fork from GitHub, I gave it sample data of Wubby talking. I used over a thousand 9 second clips of audio, that I ripped and isolated the vocals from, split into tiny files for the AI to use, and then generate a model.

Then to train it – over and over and over again – as you know from the intro post on how AI works.

At 340 epochs, which is 5100 training sessions – here’s what my AI spit out. The first file is the source file, the vocals isolated from the song above. The AI doesn’t need to be able to do anything with the music, as we’re going to overlay the finalized lyrics over top of the source song, eventually.

Source Sample Audio

Processed Audio

Just a friendly reminder, that this AI will need roughly 30,000-40,000 training sessions, or 3000 epochs, to be of studio quality singing.
The above is only at 10% completion (but it’s still really impressive for 10% complete!!).

You can contribute to training by downloading my model from my repo on HuggingFace, or keep an eye out on my website as I’ll make another post once it’s near completion. Title image was generated using DALL-E 2.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.