Millions of Songs Found in AI Training Datasets, Investigation Reveals

A new investigation reveals that massive datasets containing millions of songs are being used to train AI music models, intensifying copyright disputes.
A digital interface displaying a searchable database of millions of song titles and artist names used in AI training datasets. A digital interface displaying a searchable database of millions of song titles and artist names used in AI training datasets.

An investigation has uncovered four large-scale music datasets being shared within the AI-development community, with the largest containing 12 million tracks and another holding 9 million songs. The findings shed new light on the scope of copyrighted material potentially used to train generative AI music models without licensing agreements.

Alex Reisner, who examined the datasets, highlighted the gap between industry claims and the reality of accessible data.

“Companies often claim to use only content that is freely available online, but the datasets reveal the quantity of downloadable music that developers can access even though it is not supposed to be free,” Reisner wrote.

The revelation comes as Universal Music Group and Sony Music recently sought to add more than 61,000 recordings to their copyright infringement lawsuit against AI music service Suno, a move Suno is opposing. The datasets have been downloaded thousands of times, but it remains unclear which companies have used them for training. The investigation has made the collections searchable, allowing the public to see which songs are included.

Previous Post
Martin Garrix performing on stage with pink, blue, and purple lasers, debuting the Madonna collaboration 'Bizarre' at Barclays Center.

Martin Garrix Debuts Madonna Collaboration 'Bizarre' in New York

Next Post
A musician looking at a smartphone displaying the OnlyFans app, with a shadowy figure representing an OnlyFans manager in the background.

OnlyFans Manager Exploitation Raises Red Flags for Musicians