With rapid technological development, the number of stem-splitting solutions on the market is growing exponentially. The possibility to extract stems from mixed and mastered songs is now available to anyone, no audio source separation knowledge is required. All you need to do is select one of the splitters you found online, upload a song, click a couple of buttons, and voilà – vocal and instrumental tracks are separated.
The quality of stem separation also grows more and more every year. Thanks to artificial intelligence and machine learning technologies, web-based splitters have equaled, and some have surpassed, professional software. Although, the prices of these services still vary greatly. Online splitters are generally affordable, costing about a dozen of dollars, whereas professional software is significantly more expensive, coming at several hundred dollars and counting.
Looking at the situation and the number of options, a legitimate question arises – what is better to choose, a web service or a professional application? Which stem splitter actually provides the best quality? If you turn to the Internet for answers, you will find many contradictory reviews.
We decided to make the most objective and easily verifiable comparison of two leading tools for the task – iZotope RX 9, professional audio software, and LALAL.AI, an AI-based online service. In this article, you will find a detailed analysis of the stem separation quality of both services on the example of the same songs.
🟡What is iZotope RX 9?
RX is an audio restoration suite developed by iZotope, a renowned audio technology company. RX 9 is the latest version of the toolkit that covers various post-production needs such as extracting clean dialogue, removing the interference, hum and reverb, eliminating clicks, pops, and digital impulse noises, as well as rebalancing instruments and splitting audio into 4 stems.
Stem splitting feature is a part of the RX 9 Music Rebalance tool which is available as a Pro Tools AudioSuite and as a plugin in Logic Pro X. iZotope RX 9 comes in several editions, with pricing ranging from $399 ($299 when on sale) all the way to $1,999 ($999 when on sale).
🟡What is LALAL.AI?
It’s a web-based service running on a state-of-the-art audio source separation technology, Phoenix. The highly perfected in-house developed neural network allows for quick and precise stem splitting, voice extraction and noise reduction.
LALAL.AI Splitter is capable of removing and extracting 8 stems – vocals, instrumental, drums, bass, piano, synthesizer, acoustic guitar and electric guitar. LALAL.AI Voice Cleaner eliminates background music, vocal plosives, mic rumble and other extraneous noises from video and audio recordings.
It can be used on all operating systems on desktop and mobile devices. The service has a free version and several paid plans ranging from $15 to $300, depending on how many minutes worth of audio and/or video users want to process.
🟡Conditions of Quality Comparison
- Several music compositions were selected for the test. For the sake of objectivity, the same compositions were processed both in LALAL.AI and in iZotope RX 9.
- For the LALAL.AI test, the Mild processing level was used, which is the gentlest method in terms of artifact removal. If the Normal level had been applied, the results would have been better in many cases. The choice of the Mild method was motivated by the desire to show that even in the softest mode LALAL.AI outperforms RX 9.
- For the iZotope RX 9 test, the Best processing level was set, which is the most aggressive method in terms of artifact removal.
- The test was focused solely on the comparison of vocal stem separation quality because:
a) On the vocal channel, you can hear the stem separation errors much more strongly.
b) The instrumental channel, on the other hand, is much more difficult for psychoacoustic analysis due to the greater complexity of the content, so the conclusions drawn from it are considerably less accurate.
c) RX 9 doesn’t allow you to extract an entire instrumental stem. Instead, it generates bass, drums and “other” stem (with the rest of the music). In order to get an instrumental, you would have to mix these stems, which could introduce additional distortions and affect the objectivity of the analysis.
🟡Test and Analysis
The analysis was performed using psychoacoustic methods, meaning listening to the stem separation results made by LALAL.AI and RX 9.
During the listening process, we identified parts of the vocal stems that show the most characteristic or very noticeable problems of RX9. Below you can listen to these parts and see descriptions of the problems we found.
Expand the composition bars to play/download the original unprocessed audio and full vocal stems extracted by LALAL.AI and iZotope RX 9 respectively.
Composition #1 – Original Audio & Full Vocal Stems
In the RX 9 stem excerpt, the female backing vocals are incomplete and shallow. In some places, they get swallowed and are completely absent.
Leakage from the instrumental channel in the form of hissing can be heard throughout the entire RX 9 stem excerpt but especially at the beginning where there is no vocal.
Composition #2 – Original Audio & Full Vocal Stems
The vocal is dropping throughout the RX 9 stem excerpt. There are also artifacts – they sound like metallic whistles and are particularly noticeable in places where the vocal drops.
Same composition, different section. In the RX 9 stem excerpt 2, you can still hear the drops and metallic whistles as in excerpt 1, but there are also low-frequency artifacts caused by the same effects of incorrect phase accounting.
Another section of the same composition. In the RX 9 stem excerpt, the drop effects are easily perceptible. The phase defects during the drops can be heard very well too.
Composition #3 – Original Audio & Full Vocal Stems
In the RX 9 stem excerpt, the rhythm guitar leaks into the vocal channel. Even though it doesn’t sound like an independent instrument, it sets such a clear rhythm that you involuntarily start nodding to the beat.
In the LALAL.AI stem excerpt, the rhythm guitar is also leaking but to a much smaller extent. What’s more, it’s happening in a different manner – only the sounds produced by the strings when the guitarist's hands move quickly are leaking to the vocal channel. Such sounds are a lot less noticeable in the majority of the vocals.
Composition #4 – Original Audio & Full Vocal Stems
In the RX 9 stem excerpt, a significant vocal channel leak can be heard. It’s especially evident at the beginning of the excerpt.
In the RX 9 stem excerpt, you can clearly hear the electric guitar leaking into the vocal channel. Sonically, it may seem similar to vocals, yet LALAL.AI successfully managed to remove the guitar sound from the same section of the composition.
Composition #5 – Original Audio & Full Vocal Stems
Throughout the entire RX 9 stem excerpt, you can hear hissing and whistling sounds from the leakage of the non-vocal part, which for obvious reasons are best heard in places with no vocals. In addition, there are jumps in the vocal level.
Composition #6 – Original Audio & Full Vocal Stems
Another example of extremely significant leakage of the instrumental part into the vocal channel is in the RX 9 stem excerpt. Here it’s actually audible not only in places without vocals but also in sections with vocals.
🟡Test Results and Conclusion
📍 In the test compositions, we could not find any sections where iZotope RX 9 would extract vocals better than LALAL.AI.
📍 iZotope RX 9 processes audio about twice as slow as real-time. While RX 9 was processing one song, the entire set of 6 test compositions was uploaded, split, and downloaded from LALAL.AI. All operations were performed on the same test laptop (MacBook Pro 13" Core i5).
📍 In many compositions processed by RX9, the instrumental part leaks into the vocal channel in the form of an unpleasant high-frequency hiss. This is particularly noticeable in sections with no vocals. Even though these sound artifacts are more subtle in vocal parts, they are still most definitely present there as well.
📍 Many compositions processed by iZotope RX 9 have peculiar “metallic” sounds and phase rotation effects in the vocal channel. It indicates either a problem with phase processing or a lack of such processing at all. Phase problems not only create audible artifacts but can also lead to difficulties mixing vocals for remixes and other creative tasks.
During the test, LALAL.AI demonstrated a more accurate and cleaner work with audio and extracted vocal stems from each song with fewer errors than RX 9. Therefore it’s safe to say that LALAL.AI provides higher stem splitting quality than iZotope RX 9.
All original compositions and stem separation results are freely available to the public – they can be downloaded directly from the Test and Analysis section of this article or obtained here. Anyone can use the media files to repeat the comparison test and do the analysis themselves.