Synthesized Audio Samples

Basilar membrane and otolaryngology are not auto-correlations.

Scientists at the CERN laboratory say they have discovered a new particle.

Donald Trump and I are great friends.

Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?

To cancel the payment, press one; or to continue, two.

Performance reviews are stressful, time-consuming, and often meaningless.

A nuclear weapon improvised from radioactive nuclear waste material and conventional explosives.

Tina Fey’s children are Alice Zenobia Richmond and Penelope Athena Richmond.

Deny thy father and refuse thy name.

Life is like a box of chocolates. You never know what you’re gonna get.

Prepare to make a right in 500 feet

Sally sells seashells by the seashore. The shells she sells are seashells I’m sure

My show is a total disaster

Compare Real to Generated Audio

why not use the time to register to vote

Real	Fake

it’s not a great sign for your case when

Real	Fake

Procedure

Data was collected using my YouTube datascraper, YTTTS
The dataset went through various processing steps, such as labelling audio that was probably not John Oliver speaking. The cleaned dataset has been posted on Kaggle
Model was trained using Tacotron 2

Model Download + Kaggle Kernel

The model below was trained with a batch size of 32. The link contains all checkpoints saved throughout training. As it continues to train, new checkpoints will be uploaded to this directory and the files will appear accordingly:

Click here to download a model checkpoint

Click here to train it yourself on Kaggle

Links

[1] https://github.com/Ryan-Rudes/YTTTS
[2] https://github.com/NVIDIA/tacotron2
[3] https://arxiv.org/abs/1712.05884
[4] https://www.kaggle.com/ryanrudes/johnoliver
[5] https://drive.google.com/drive/folders/1jj0Ktck3ZybpDzY1yzveODnh4qFSvsAl?usp=sharing
[6] https://www.kaggle.com/ryanrudes/voice-cloning-with-tacotron-2