Spleeter - fast & free music separation using machine learning

Saw a post on the watmm forum about free music separation using machine learning and thought I’d drop a link for the technical able folks here as I have no idea on how to get it working on my pc & hope for a renoise tool :wink: .

Find more info here;
" Fast and Free Music Separation with Deezer’s Machine Learning Library


Cleanly isolating vocals from drums, bass, piano, and other musical accompaniment is the dream of every mashup artist, karaoke fan, and producer. Commercial solutions exist, but can be expensive and unreliable. Techniques like phase cancellation have very mixed results.

The engineering team behind streaming music service Deezer just open-sourced Spleeter, their audio separation library built on Python and TensorFlow that uses machine learning to quickly and freely separate music into stems. (Read more in today’s announcement.)

The team at @Deezer just released #Spleeter, a Python music source separation library with state-of-the-art pre-trained models! :notes::sparkles:

Straight from command line, you can extract voice, piano, drums… from any music track! Uses @TensorFlow and #Keras.https://t.co/e4lyVtT2lR pic.twitter.com/tDsBMSYiJD

:woman_technologist: DynamicWebPaige @ #TFWorld :earth_africa: (@DynamicWebPaige) November 2, 2019

You can train it yourself if you have the resources, but the three models they released already far surpass any available free tool that I know of, and rival commercial plugins and services. The library ships with three pre-trained models:

  • Two stems – Vocals and Other Accompaniment
  • Four stems – Vocals, Drums, Bass, Other
  • Five stems – Vocals, Drums, Bass, Piano, Other

It took a couple minutes to install the library, which includes installing Conda, and processing audio was much faster than expected.

On my five-year-old MacBook Pro using the CPU only, Spleeter processed audio at a rate of about 5.5x faster than real-time for the simplest two-stem separation, or about one minute of processing time for every 5.5 minutes of audio. Five-stem separation took around three minutes for 5.5 minutes of audio.

When running on a GPU, the Deezer team report speeds 100x faster than real-time for four stems, converting 3.5 hours of music in less than 90 seconds on a single GeForce GTX 1080."

Gethub source separation library;

If someone could get this working through a tool in Renoise sample editor, processing samples like the cdp tool for example that would be awesome! Could take digging to a whole new level! :sunglasses:


Good tool thanks for posting.

I saw this mentioned on Twitter. I’ve installed it on Windows via WSL and looking to split up my existing tracks and use the resulting stems as new source material (i.e., extract “vocals” from an instrumental).

But so far, on one test case, the vocal track was silent. So, props for correctly finding no vocals, but somewhat disappointing for my re-purposing goals : )

update: I tried the 4-stem model and it’s pretty slick, It adds some interesting artifacts/distortion to drums, bass, and so on. But does a good track with the isolation.

thanks for creating a site automating this! Just tested it on a dance song, probably not the best source for the type of stems it recognizes. Is it too blunt to classify spleeter as a fft based multiband bandpass filter? Narrowing the lower and upper limits of the filter to where the particular stems characteristics lie. Will try with ascoustic music to see if there is more magick going on.