21 Million Copyrighted Songs Found in AI Training Datasets

Over 21 million copyrighted songs are circulating in AI training datasets, and a new watchdog tool allows artists to search for their work.
Screenshot of the AI Watchdog search tool showing results for an artist's songs in AI training datasets. Screenshot of the AI Watchdog search tool showing results for an artist's songs in AI training datasets.

Over 21 million copyrighted songs are circulating in datasets used to train generative AI music models, an investigation has found, alongside the launch of a search tool that lets artists check whether their work has been included.

The findings detail four major music datasets available to AI developers, containing recordings from both chart-topping acts and independent musicians. Artists named in the data include Billie Eilish, Taylor Swift, Nirvana, and Bad Bunny.

Two of the datasets hold more than 100,000 recordings each, while the remaining two are far larger, comprising between 9 million and 12 million tracks apiece. All four have been downloaded thousands of times, though it remains difficult to trace which companies have used them because AI training data is rarely disclosed publicly. Google and Stability AI were identified as having used the Free Music Archive dataset in their training processes.

A companion search tool, AI Watchdog, allows artists to search by name to see if their music appears in any of the four datasets. Searches show that Eric Prydz has 54 songs in the sets, Honey Dijon has 126, Björk has 411, Moby has 213, Fatboy Slim has 175, The Chemical Brothers have 153, Daft Punk have 151, and Charlotte de Witte has 89.

Artist Reactions

SZA shared her frustration on Instagram after discovering 238 of her songs in the datasets, including what she believes are unreleased tracks.

Jus checked and music AI has trained off 238 of my songs. I’m certain some unreleased. If your a musician and you support this degenerate shit? Your disgusting and there’s NOTHING YOU COULD EVER SAY TO ME TO MAKE THIS OKAY.

Producer Kenny Beats directed criticism at AI music company Suno.

I can’t imagine going into work daily knowing you are stealing from countless struggling musicians. I can’t imagine being proud to earn a paycheck obliterating the work and dreams of artists.

DJ Sabrina the Teenage DJ reacted on Bluesky after finding 22 of her songs in the data.

To everyone who thought my music sounded like AI slop, did you ever think it was because Suno was using a dataset that contained 22 of my songs? It’s funny how there were no accusations of my music sounding like AI slop until these datasets started getting used to generate slop.

Legal Landscape

The revelations arrive as AI music companies Suno and Udio face multiple lawsuits from artists, record labels, and unions over the use of copyrighted material. Last year, Universal Music settled its own lawsuit against Udio, and the two parties agreed to collaborate on a new platform.

Suno CEO Mikey Shulman previously drew widespread criticism for stating that “it’s not really enjoyable to make music now” and that “the majority of people don’t enjoy the majority of the time they spend making music.”

Previous Post
The Revelin Fortress in Dubrovnik, Croatia, hosting the Culture Club Revelin summer 2026 event series.

Culture Club Revelin announces summer 2026 line-up

Next Post
The Starbenders tour van, nicknamed The Beast, featured on the OnlyVans parody subscription site.

Starbenders Bassist Launches OnlyVans Tour Parody Site