How Machine Learning in Neutron Elements Track Assistant Makes Mixing Creative
Getting started is often the most difficult part of a creative project, and mixing is no exception. Mixing presets are a great way to quickly audition new ideas, but they aren’t created specifically for your material, so significant adjustment may be required. For example, a voice preset may not have been designed for the range of a specific vocalist or the room resonance of a specific recording environment. Nevertheless, using presets is a great learning experience and can get you closer to the sound you're listening for, faster.
Using machine learning and intelligent DSP (digital signal processing), Neutron’s Track Assistant feature automatically creates a custom preset specific to your track. You can use the settings suggested by Track Assistant to identify potential areas of interest in your track and hopefully get to being creative more quickly.
The Difference Between Machine Learning and Intelligent DSP
Before we begin discussing the nuts and bolts of Track Assistant, it’s important to define the difference between intelligent DSP and machine learning. Intelligent DSP refers to the process of using the properties of the audio signal (e.g. fundamental frequency) to set a parameter of the DSP algorithm (e.g. the position of an EQ node). At iZotope we have a long history of intelligent DSP algorithms such as the Intelligent Release Control (IRC) algorithms in the Ozone Maximizer and the Adaptive mode in RX’s Spectral De-noise plug-in.
Machine learning algorithms, on the other hand, typically require a large dataset, which we use to detect patterns useful for making predictions when presented with new data. For example, given a large collection of guitar samples, we can teach an algorithm what a guitar sounds like, and then determine whether or not any new sound that the algorithm listens to contains a guitar.
“One way to think about Track Assistant is as an adaptive preset specific to your audio.”
Automatically identifying the type of instrument on your track is the crucial first step of the Track Assistant process. Once we know what type of audio source is being processed, we can use that information to help our intelligent DSP make decisions specific to that instrument. For example, in a vocal track, we may want to place dynamic EQ nodes in the upper mids to help lessen ringing; the intelligent DSP decides exactly where to place the nodes for each specific vocal track. It’s this combination of machine learning and intelligent DSP that makes Track Assistant possible.
How Machine Learning Is Used in Track Assistant
At iZotope, we recognized the ability of machine learning to expand and assist traditional DSP techniques, and in 2012 we acquired Imagine Research to improve our technology in that area. Imagine Research collected a large dataset to use for automatic instrument ID, but the state of machine learning algorithms back in 2012 left something to be desired. Over the past few years, there has been an explosion in machine learning techniques known as deep learning. It was these breakthroughs in deep learning combined with the large dataset of labeled audio files that enabled the automatic instrument ID technology in Track Assistant.
Many of the most successful deep learning algorithms have been developed to work with images (e.g. automatic Facebook image tagging), so the first step in our instrument recognition approach is to represent sound as an image using the spectrogram. While we could use a typical image-based deep learning pipeline, it’s important to recognize the difference in meaning of the horizontal (time) and vertical (frequency) dimensions of a spectrogram when compared to a typical image. After a few audio specific modifications, we can then feed a large amount of spectrograms labeled by instrument to train our ID algorithm. Under the hood, this algorithm is learning to recognize a large number of instruments and then mapping them to a smaller number of instrument classes that have specific rules and settings for guiding the intelligent DSP that happens at the next stage.
“Mixing is better when you combine the precision of advanced algorithms with human creativity.”
One way to think about Track Assistant is as an adaptive preset specific to your audio. However, unlike traditional presets, this preset contains settings for how intelligent DSP will adapt to your audio in addition to static settings like those in a regular preset. This “adaptive preset” is selected based on the instrument ID and the type of sound you are going for (e.g. Broadband Clarity). These presets were created by our sound design team using best practices in audio engineering, in the same way all the regular presets in Neutron or other iZotope products were created. However, the adaptive presets used by Track Assistant also contain the rules necessary to guide our intelligent DSP, such as setting a high-pass filter below the lowest fundamental frequency observed in your guitar track.
Why iZotope Didn’t Fully Automate Mixing
Recently several products and academic research groups have provided solutions to fully automate certain portions of the audio engineering workflow like mastering. However, ceding complete control to an algorithm can feel heavy-handed and uninspiring. That’s why we took the adaptive preset approach in Neutron—we can provide the benefits of machine learning and intelligent DSP, but you retain full control of a preset. We believe mixing is better when you combine the precision of advanced algorithms with human creativity.
Track Assistant can also be valuable from an educational standpoint. There is no shortage of tutorials on the mixing process, but none of them can provide guidance on your specific audio. While Track Assistant isn’t intended to deliver a finished product every time, it will give you specific suggestions to your track. This may help inspire new ways to EQ a synth line, or set the compressor threshold in a problematic drum bus.
What’s Possible in the Future
When it comes to the future of machine learning, it seems like the limit of what’s possible expands almost every day. One particularly exciting area is the development of audio specific deep learning techniques that operate directly on the waveform, as opposed to using techniques developed for images. These advances could potentially enable another leap forward in instrument ID performance. On the intelligent DSP front, an exciting research area is exploring connections between semantic descriptions of music production (e.g., punchy, warm, etc.) and the specific DSP parameters associated with them. One initiative, the SAFE Project, hopes to allow producers to apply effects to instruments by typing in keywords, rather than adjusting settings on their plug-ins.
With machine learning it can be easier to teach an algorithm to fully automate a task than to learn something assistive that works with the user. For a creative task like audio mixing, this might lead to uniformity, and we need to be sure to learn something tweakable enough that creative control always belongs to humans.