Well if you know anything about spectral analysis, you will understand how this works…not to say that it isn’t farking amazing, 'cause it is…
Basically, he’s using FFT to split the waveform into it’s component sin waves, looking for areas with groupings of strong peak data. Then he’s going “Oh, that must be a note from one instrument” … then taking those groupings and ripping them into their own waveforms as isolated notes
that’s where it gets nutty though, 'cause anyone who’s ever messed with spectral editors can tell you that even if you do get a clean visual of a grouping of strong peaks, simply lassoing them and playing them back will result in a very tin-can sounding noise… that only has a slight resemblance to the original note. This is because almost every instrument humans have devised have harmonics which span quite a bit of the audible soundscape. This is because, quite simply, the wider the frequency range of a note, the better it sounds… this is why supersaws are so huge sounding… because they’ve got a ton of harmonics.
But harmonics are a funny thing, and it seems to me that this genius of a man realized this. Within a single note, they tend to have a mathematical pattern. Quite simply, a square wave, which is harmonically rich compared to a sin wave, has harmonics from many octaves of the note it’s playing… all of which are simply the octave below multiplied by 2.
So that considered, if you can isolate the strongest frequencies of the note, and figure out what basic tones it’s comprised of, it should be fairly simple to scan the rest of the waveform and determine which harmonics belong to it. Sure, sometimes notes will share the same harmonics, but using some more advanced statistical methodologies, one could conceivably determine which oscillations of those harmonics belong to which note.
Come to think of it, it might be pretty simple to look for patterns in spectral data by monitoring the waveform peaks in the same way a compressor does, to see which peaks and valleys match up… this would be even easier than looking for harmonics.
I’ve been thinking about this stuff for a long time… long before this innovation came around, and as far as I’m concerned it was only a matter of time before someone did it. And as evident by this breakthrough, all it took was someone what knew enough about audio processing, fft, and statistical methods to figure it out. BTW, If you feel like raping this post and using it to devise your own VST hackery, I request that you please send me a free copy of your resulting plugin