NEW NEURAL NETWORK

LALAL.AI Orion: New AI for Better, Faster, Cleaner Stem Separation

Introducing the cutting-edge solution that takes speed and quality to a whole new level. Learn more about the new LALAL.AI neural network.

Oksana

05 Oct 2023 • 5 min read

In the world of audio source separation, speed and quality are paramount. That's why we're excited to introduce Orion, a brand-new neural network that is set to transform the way we extract vocals, instrumentals, and other stems from a mix. With Orion, users can expect lightning-fast results and truly significant improvements in quality.

🔵Orion in Numbers

⭕2x faster file processing and splitting.

⭕20x more computational power used for training (compared to Phoenix).

⭕70% cleaner results from various sound artifacts (like distortion).

🔵New Approach to Separation

So, what sets Orion apart from its predecessors? The answer lies in its fundamentally different approach to stem splitting. Previous solutions, including the revolutionary Phoenix, relied on a mask-based method that masks irrelevant elements of the mix, leaving the necessary ones, such as vocals. Orion takes a more advanced and challenging path with the direct synthesis method.

Instead of simply "carving" out stems from the original mix using spectral masks, Orion goes beyond that. It literally synthesizes and recreates stems. This unique approach allows Orion to overcome the limitations of mask-based solutions and push the boundaries of achievable quality.

If we draw a parallel with image processing, then stem extraction using mask-based techniques can be compared to roughly cutting an object from a photo using a mask. The cut-out is very likely to contain errors, such as parts of other irrelevant objects. Using the synthesis method is like asking a professional artist to draw an object of interest from a photo and create a neutral background for it without all the unnecessary objects.

🔵Improved Quality

One of the metrics we use to evaluate the quality of separation is the signal-to-distortion ratio (SDR), e.g., how much distortion (or artifacts) separation processes introduce to the vocal and instrumental parts.

The leap in quality from Phoenix to Orion is even greater than the entire quality gain achieved throughout the history of our project – it outperforms Phoenix by an impressive 2.5 dB. To put this into perspective, Phoenix had already surpassed its predecessor, Cassiopeia, by 0.7 dB. And Cassiopeia had exceeded Rocknet, our first network, by 1 dB.

⭕Orion vs. Phoenix, Cassiopeia & Rocknet

Why such a huge leap? Mask-based solutions like Phoenix and Cassiopeia have one big disadvantage – they can’t produce anything beyond what was initially present in the mix; they extract sections as they are.

Parts of the track, such as the vocals, in the finished mixed song exist in a compressed, corrupted form. Phoenix and other mask-based solutions can take the vocals out of the mix as is. The vocal sounds good in the song because it was processed to sound good in the mix, but when taken out and listened to separately, the defects become apparent.

This is because a significant amount of information has been killed in the vocal. People would not have heard this part in the mix, so its removal does not affect the perception of the mix. But it greatly affects the perception of vocals played separately from the instrumental part.

Orion leverages its understanding of vocals and vocal processing to enhance the extraction process and fill the gaps. By learning from thousands of other mixes, it can extend and improve the vocal stem during extraction, resulting in noticeably better quality compared to mask-based methods.

⭕Distortion & Phasing Effect

One common complaint we previously received from users was the phasing effect and distortion that occurred when stems were abruptly removed in places where other sources were playing loudly.

With Orion, we have significantly reduced this distortion, resulting in a more seamless and immersive listening experience. Don't just take our word for it – you can hear the difference for yourself in the examples provided below.

🔵Audio Examples

In order to best illustrate the evolution of our neural networks and how Orion qualitatively differs from previous solutions, let's take a look at how each of the four handles extracting vocals from different songs.

Toggle the song bars to play the original unprocessed test songs and full stems.

The playlist with all original test songs, full stems and stem excerpts is here.

🟡Song #1 – Full Audio & Stems

Song name: Rival Tides — Vultures

▶︎ Original song

▶︎ Full vocal stem (Rocknet)

▶︎ Full vocal stem (Cassiopeia)

▶︎ Full vocal stem (Phoenix)

▶︎ Full vocal stem (Orion)

lalalai_app · [Rocknet] Rival Tides — Vultures (Vocal Excerpt)

lalalai_app · [Cassiopeia] Rival Tides — Vultures (Vocal Excerpt)

lalalai_app · [Phoenix] Rival Tides — Vultures (Vocal Excerpt)

lalalai_app · [Orion] Rival Tides — Vultures (Vocal Excerpt)

Classic vocals in front of a relatively dense background. However, you can hear in this excerpt that Phoenix manages to lose tonal characteristics, causing the effect of phasing and making the voice sound like it is coming from underwater.

🟡Song #2 – Full Audio & Stems

Song name: Rival Tides — Sour Milk

▶︎ Original song

▶︎ Full vocal stem (Rocknet)

▶︎ Full vocal stem (Cassiopeia)

▶︎ Full vocal stem (Phoenix)

▶︎ Full vocal stem (Orion)

lalalai_app · [Rocknet] Rival Tides — Sour Milk (Vocal Excerpt)

lalalai_app · [Cassiopeia] Rival Tides — Sour Milk (Vocal Excerpt)

lalalai_app · [Phoenix] Rival Tides — Sour Milk (Vocal Excerpt)

lalalai_app · [Orion] Rival Tides — Sour Milk (Vocal Excerpt)

In this excerpt, you can hear that Orion extracts all echoes, unison singing and timbral melismas. All other neural networks do not.

🟡Song #3 – Full Audio & Stems

Song name: Ride Free — Overload

▶︎ Original song

▶︎ Full vocal stem (Rocknet)

▶︎ Full vocal stem (Cassiopeia)

▶︎ Full vocal stem (Phoenix)

▶︎ Full vocal stem (Orion)

lalalai_app · [Rocknet] Ride Free — Overload (Vocal Excerpt)

lalalai_app · [Cassiopeia] Ride Free — Overload (Vocal Excerpt)

lalalai_app · [Phoenix] Ride Free — Overload (Vocal Excerpt)

lalalai_app · [Orion] Ride Free — Overload (Vocal Excerpt)

In this case, vocal losses and phasing are observed even in the Phoenix neural network stem. Orion does not have this. At the same time, we can hear that Cassiopeia contains some noise, and Rocknet lets a lot of unnecessary noise through, and the vocal breaks off in unexpected places. Except for Orion, all neural networks file at the end of this excerpt; they either swallow the vocal or let the instrumental part bleed into the vocal.

🟡Song #4 – Full Audio & Stems

Song name: Milano — Go Off

▶︎ Original song

▶︎ Full vocal stem (Rocknet)

▶︎ Full vocal stem (Cassiopeia)

▶︎ Full vocal stem (Phoenix)

▶︎ Full vocal stem (Orion)

lalalai_app · [Rocknet] Milano — Go Off (Vocal Excerpt)

lalalai_app · [Cassiopeia] Milano — Go Off (Vocal Excerpt)

lalalai_app · [Phoenix] Milano — Go Off (Vocal Excerpt)

lalalai_app · [Orion] Milano — Go Off (Vocal Excerpt)

This is a very illustrative excerpt where you can clearly hear the difference between all the neural networks. Rocknet chews up half of the vocal; Cassiopeia works better with vocals, but the result is still a bit noisy; Phoenix works even better, adding more high-frequency detail, but there is still an effect of the vocal being swallowed; Orion does not swallow the vocal, and there is more detail in the voice.

🟡Song #5 – Full Audio & Stems

Song name: Kissing Candice — Magic Show

▶︎ Original song

▶︎ Full vocal stem (Rocknet)

▶︎ Full vocal stem (Cassiopeia)

▶︎ Full vocal stem (Phoenix)

▶︎ Full vocal stem (Orion)

lalalai_app · [Rocknet] Kissing Candice — Magic Show (Vocal Excerpt)

lalalai_app · [Cassiopeia] Kissing Candice — Magic Show (Vocal Excerpt)

lalalai_app · [Phoenix] Kissing Candice — Magic Show (Vocal Excerpt)

lalalai_app · [Orion] Kissing Candice — Magic Show (Vocal Excerpt)

This composition has a very heavy and thick background. Even the human ear has a hard time picking out the vocals. Rocknet leaves little of the vocals; Cassiopeia’s vocal has a lot of noise. Phoenix suppresses the overtones of the vocal timbre, making it sound very flat.

Orion extracts the unison singing and does it thoroughly so that the vocals sound rich and the unison voices are easily identifiable, even though in the original mix, you can't hear them at all over the heavy background.

🔵What's Next

Currently, Orion supports vocal and instrumental stem separation – you can try it now on our site. Going forward, we are going to gradually expand the list of Orion-supported stems to 10, as we did with all the previous neural networks.

Until then, the piano, drums, bass, acoustic guitar, electric guitar, synthesizer, wind and string instruments will be extracted by Phoenix, and vocal and instrumental by Orion.

We hope the new neural network improves your experience with our service and makes your work with audio easier.

Happy splitting!

Follow LALAL.AI on Instagram, Facebook, Twitter, TikTok, Reddit and YouTube to keep up with all our updates.