Splitting mixed songs into their constituent parts – vocals, background music, separate musical instruments – has always been a notoriously difficult task that cannot yield perfect results.
However, thanks to the rapid development of artificial intelligence and machine learning technologies, we can extract tracks with unprecedented accuracy and speed.
The ideal has never been as close as it is now. There are many high-tech AI-based solutions on the market capable of isolating vocals and other stems in practically studio quality. But which service does it better than the rest?
In the past, we conducted stem-splitting quality tests comparing Spleeter, PhonicMind and iZotope with LALAL.AI. The latter proved its unequivocal superiority over each of the opponents. But is LALAL.AI also better than Moises.ai? Let's find out.
What is Moises.ai?
Moises.ai or simply Moises is an audio separation software for iOS, Android and web launched in 2019. Powered by Deezer’s AI, Moises provides the possibility to extract vocals, instrumental track, drums, bass and piano from songs and videos.
It allows uploading files in various formats: MP3, AAC, AAC3, WAV, FLAC, AIFF, OGG, WMA, MP4, M4V, M4R, MPEG, FLV, MOV, MKV, and WEBM. However, isolated stems can only be downloaded in MP3, WAV, and M4A.
It has some pretty neat features that can come in handy for musicians, such as metronome, speed and pitch changer, key and chord detection, track trimming and looping, volume adjustment, and muting selected stems.
What is LALAL.AI?
LALAL.AI is a dedicated stem-splitting service launched in 2020. It’s running on a unique in-house developed neural network, Phoenix. With its help, the service provides high-quality splitting of audio and video files into vocals, instrumental, drums, bass, acoustic guitar, electric guitar, piano, and synthesizer.
Users can upload files in MP3, OGG, WAV, FLAC, AVI, MP4, MKV, AIFF, and AAC. Extracted stems are downloaded in the same format and bitrate as the original file without quality loss.
In addition to stem splitting, LALAL.AI has a voice cleaning functionality allowing users to remove background noise and other unwanted sounds (like vocal plosives, mic rumble, loud breathing, etc.) to enhance voice in audio and video recordings.
How We Compared the Two
Although both services can extract several different stems, in this quality comparison we focus solely on how LALAL.AI and Moises work with vocals. There are a few reasons why.
First and foremost, vocal isolation is by far the most sought-after type of audio source separation. Secondly, vocal extraction is extremely difficult even for state-of-art technologies – just because a service utilizes AI does not automatically mean it provides high-quality results.
In order to establish which service performs vocal extraction better, we uploaded the same songs into Moises and LALAL.AI and then compared and analyzed the vocal stems we received.
For the sake of objectivity, we made the test songs available for download, so that anyone reading the analysis can perform this test themselves and see that the results displayed here weren’t altered in favor of one or the other service.
Psychoacoustic methods (listening to the stem separation results made by both services) and spectrogram examination were used for the analysis.
During the listening process, we identified parts of the vocal stems that showcase the most characteristic and noticeable problems best. Examples of such parts and their description are given below.
Toggle the song bars to play and/or download the original unprocessed audio and full vocal stems extracted by LALAL.AI and Moises.
🟡Song #1 – Original Audio & Full Vocal Stems
In this excerpt, there are audible drops in the backing vocals, namely in the part with vocalism (where only vowel sounds are sung). It’s pronounced in the LALAL.AI vocal stem, there are gaps in the second half of the excerpt.
In the Moises vocal stem, the gap can be very well heard, whereas the vocalism is almost inaudible in the second half of the excerpt.
In this excerpt, both LALAL.AI and Moises have the same artifacts. However, in addition to the background noise which is superimposed on the whole track, the Moises vocal stem also has some sound similar to distortion. Apparently, it’s placed to disguise the leakage of the instrumental part into the vocal channel.
🟡Song #2 – Original Audio & Full Vocal Stems
LALAL.AI handled vocals worse than Moises in this section. You can hear the sound of the guitar much more in the LALAL.AI’s excerpt than in Moises’ part.
Vocals extracted with Moises have a background noise that persists throughout the track. You can hear the rhythm part, individual instruments, and the whole instrumental through this background.
🟡Song #3 – Original Audio & Full Vocal Stems
Both services managed vocal extraction about the same with similar errors but LALAL.AI provided a better result. Bass leakage is audible in the LALAL.AI part, whereas in Moises’ part the entire instrumental can be heard along with some hissing in the background.
In this excerpt, you can hear that vocals extracted by both LALAL.AI and Moises have a phaser effect. In LALAL.AI’s case, it’s smoother and barely attracts attention while in Moises’ case the effect is harder on the ear.
🟡Song #4 – Original Audio & Full Vocal Stems
Sometimes Moises features a high-frequency squeak in the right channel in the vocals which can be clearly heard in this segment. LALAL.AI has no such sound in this or other examples.
In the above examples, you can notice that the vocals extracted by Moises sound slightly more muffled than the vocal tracks isolated by LALAL.AI. In many cases, we also noted the low-frequency noise present in the Moises vocal channel.
Since sound perception is relative and depends on a number of different factors, we also analyzed the quality test results using a more objective method, namely the study of stem spectrograms obtained from both services.
After such analysis, we determined that the noise in Moises' vocals is indeed present, and its level is higher than that of LALAL.AI by as much as 8 times.
In order to avoid making this already rather a long article even longer, we present a comparison of the spectrograms of only one track (called “Song #4” in the Psychoacoustic Analysis section).
This is two spectrograms of an excerpt from song #4, LALAL.AI at the top, Moises at the bottom:
The spectrograms below show that in the intervals where there should be silence, the noise intensity of Moises is much higher than that of LALAL.AI. The intervals are highlighted with red rectangles. The Moises spectrogram is much brighter than the one for LALAL.AI:
Note that the low-frequency part of the Moises spectrogram is also very bright. This indicates the presence of low-frequency noise, although there should be no low frequencies in the vocal stem. The low-frequency parts of LALAL.AI and Moises are highlighted by red rectangles below:
The results of our analyses showed that both LALAL.AI and Moises extract vocals at a high level, occasionally doing so with a number of errors, however, LALAL.AI makes fewer and less noticeable missteps. Therefore we can conclude that LALAL.AI is a better vocal remover than Moises.
In general, Moises isolates vocal stems quite well, we liked the sound of the vocals. It’s clear, there is no strong muffling that many other solutions produce, such as Spleeter, for instance. But the ever-present background noise sticks out like a sore thumb. At first, it resembled the effect of the vinyl record hissing. Unfortunately, after long listening sessions for this test, we were left with a painful feeling in the ears because of this noise.
In addition, many Moises stems exhibit a leakage of the instrumental part into the vocal channel with a phase rotation effect. This can be caused by the problem with phase processing. This problem not only creates audible artifacts but can also present a number of difficulties for the service users during vocal mixing for remixes and other tasks.
Feel free to repeat the test yourself, all audio materials can be downloaded from our SoundCloud playlist.
You can also read our previous stem-splitting quality comparison articles: