Many machine-learning models have been trained using the ASVspoof dataset. But how well do these model generalize to real-world deepfake?
To investigate this question, we present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms.
For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.
The dataset is intended to be used for evaluating 'Deepfake Detection' or anti-spoof machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper here.
The most interesting deepfake detection models we used in our experiments can be found here:
Download the dataset here, or listen to a few samples below. For each speaker, we give three authentic (bona-fide) and three fake (spoofed) audio files.