• Reminder: Do not call, text, or mention harrassing someone in real life. Do not encourage it. Do not talk about killing or using violence against anyone, or engaging in any criminal behavior. If it is not an obvious joke even when taken out of context, don't post it. Please report violators.

    DMCA, complaints, and other inquiries:

    [email protected]

O&A AI reunion show

Dr. Adam Cox

Yes, I'm trying to raise super kids
Forum Clout
808
It's actually scary how accurate Opie's voice is. I honestly wouldn't be able to tell it apart from the actual opester if I didn't know this was AI.
the opie model came out way better than the ant one. The Ant one was more so a proof of concept, so I just used one of his long gun rants from a random 2012 show to train it because it was easy. As soon as I heard his voice in OPs audio, I knew for sure that was my ant model because of the artifacts, like how it sounds kind of robotic or auto-tuned when transitioning to certain sounds. I spent more time on the opie model and used more audio from different shows with a wider range of vocal inflections and stuff, so it sounds much more natural.
On my to-do list is cleaning up and adding more stuff to the Ant dataset and re-training it so it sounds as natural as opie's, then making one for jimmy. After those are done and they start getting used (like in this post), I'll throw up a poll for who should be next.
 
Forum Clout
786
the opie model came out way better than the ant one. The Ant one was more so a proof of concept, so I just used one of his long gun rants from a random 2012 show to train it because it was easy. As soon as I heard his voice in OPs audio, I knew for sure that was my ant model because of the artifacts, like how it sounds kind of robotic or auto-tuned when transitioning to certain sounds. I spent more time on the opie model and used more audio from different shows with a wider range of vocal inflections and stuff, so it sounds much more natural.
On my to-do list is cleaning up and adding more stuff to the Ant dataset and re-training it so it sounds as natural as opie's, then making one for jimmy. After those are done and they start getting used (like in this post), I'll throw up a poll for who should be next.
I vote bobo or intern David
 
Forum Clout
86
the opie model came out way better than the ant one. The Ant one was more so a proof of concept, so I just used one of his long gun rants from a random 2012 show to train it because it was easy. As soon as I heard his voice in OPs audio, I knew for sure that was my ant model because of the artifacts, like how it sounds kind of robotic or auto-tuned when transitioning to certain sounds. I spent more time on the opie model and used more audio from different shows with a wider range of vocal inflections and stuff, so it sounds much more natural.
On my to-do list is cleaning up and adding more stuff to the Ant dataset and re-training it so it sounds as natural as opie's, then making one for jimmy. After those are done and they start getting used (like in this post), I'll throw up a poll for who should be next.
I vote Patrick Tomlinson. Imagine how mad he'd be if there was a fake Patrick Tomlinson podcast where he comes clean about being a pedophile and explains in detail all the crimes he's committed.
 

TruckstopBeefers

Beefers
Forum Clout
540
I thought about that a while ago. It's possible, but it's a lot of work, especially if you want their dialogue to actually sound somewhat natural and accurate. Here's the high level flow I came up with:
  1. The program starts by calling GPT4 for a list of current events, with a focus on topics relevant to what O&A would discuss, and output it to some format.
  2. Now you could use GPT-4 to create the actual script, but it won't have enough data to output something that would sound like them actually conversing. To get that, you could use a local LLM and train a LoRA on a dataset full of quotes and conversations from O&A and certain facts that might not be there already, like nana's love for a boy named sue.
  3. Send the list of events and stuff to the local model with a prompt for creating a script that will output to a standard format. Depending on how long you want it to be, the token limit of the model, and the hardware used to run the model, you may have to break it up into separate prompts.
  4. Then you have the app break up the conversation and send each instance of each speaker talking to a text-to-speech program (this can also be done locally).
  5. Most TTS models aren't very accurate, so you'd probably then want to export the audio files to RVC (which was used for OP's audio) to replace the TTS voices with O&A's. The files should output with some sequenced naming convention.
  6. The program could then re-compile the conversation into one single audio file and throw Street Fighting Man at the beginning.
The only problem here would be spacing out the audio files when creating the combined audio track. I have no idea how you could automate natural pauses and interruptions, so you'd probably be better off manually editing it all together, but that's still not super easy to do.
There's a lot of manual stuff that would make it better, like recording yourself reading the script and using RVC on that instead of the TTS output, but this is just a theoretical automated end to end solution.

so good luck to to whoever wants to make that

what are you? some kind of SUPER kid??
 

Dr. Adam Cox

Yes, I'm trying to raise super kids
Forum Clout
808
I vote Patrick Tomlinson. Imagine how mad he'd be if there was a fake Patrick Tomlinson podcast where he comes clean about being a pedophile and explains in detail all the crimes he's committed.
I never got the obsession with the Patrick shit. He's a fat nobody midwest faggot with no connection to O&A. He had some shitty take on Norm, which made some autists on the O&A sub decide to fuck with him to the point of getting the sub shut down. I don't understand why he's still relevant here.

I'd only make ones for people that would actually be on the show, like comics, staff, and the regular idiots. David and Bobo were at the top of my list because it would give me an excuse to listen to the 5hr D&B vid again. There's also Lady Di, Vos, Bobby, stlaker Patti, Slobbo, and a few others. The only ones I probably wouldn't do are Sam, because I don't want to listen to enough of his voice to get 10 minutes of clean audio, and Patrice, because that just kind of feels wrong.
 

2-4-6-h8

Forum Clout
567
Assuming he's using the RVC models I made a few months ago, I only did Opie and Ant. I was waiting for someone to actually do something with them before I made new ones and refined the existing ones.
I'm shocked at how accurate their individual cadences and inflections are. Is this something that is programmed in some way in the voice model, or does it just pick up the inflection from sourced samples?
 

2-4-6-h8

Forum Clout
567
I thought about that a while ago. It's possible, but it's a lot of work, especially if you want their dialogue to actually sound somewhat natural and accurate. Here's the high level flow I came up with:
  1. The program starts by calling GPT4 for a list of current events, with a focus on topics relevant to what O&A would discuss, and output it to some format.
  2. Now you could use GPT-4 to create the actual script, but it won't have enough data to output something that would sound like them actually conversing. To get that, you could use a local LLM and train a LoRA on a dataset full of quotes and conversations from O&A and certain facts that might not be there already, like nana's love for a boy named sue.
  3. Send the list of events and stuff to the local model with a prompt for creating a script that will output to a standard format. Depending on how long you want it to be, the token limit of the model, and the hardware used to run the model, you may have to break it up into separate prompts.
  4. Then you have the app break up the conversation and send each instance of each speaker talking to a text-to-speech program (this can also be done locally).
  5. Most TTS models aren't very accurate, so you'd probably then want to export the audio files to RVC (which was used for OP's audio) to replace the TTS voices with O&A's. The files should output with some sequenced naming convention.
  6. The program could then re-compile the conversation into one single audio file and throw Street Fighting Man at the beginning.
The only problem here would be spacing out the audio files when creating the combined audio track. I have no idea how you could automate natural pauses and interruptions, so you'd probably be better off manually editing it all together, but that's still not super easy to do.
There's a lot of manual stuff that would make it better, like recording yourself reading the script and using RVC on that instead of the TTS output, but this is just a theoretical automated end to end solution.

so good luck to to whoever wants to make that
This is seriously fascinating.
 

Dr. Adam Cox

Yes, I'm trying to raise super kids
Forum Clout
808
I'm shocked at how accurate their individual cadences and inflections are. Is this something that is programmed in some way in the voice model, or does it just pick up the inflection from sourced samples?
I'm guessing OP recorded himself doing the voices and then used RVC. Cadence and inflection come from the source performance, tone and timbre come from the model. Using TTS --> RVC works, but having tried that using different TTS methods, it never sounded natural to me. It's a lot easier to record a half-assed impression, since you just need to mimic the speaking style and not the actual sound of their voice.
 
Top