Mono summing - is there a better way?

I have been thinking a bit about summing mono to stereo lately and thought I’d share some ideas to see what the knowledgable guys on this forum think.

Basic: Summing stereo to mono is usually done by L+R. This means that we’ll get silence in the case of L being inversed to R (totally out of phase). This is the extreme of what we know as “not being mono compatible” when mixing, i e 100% stereo.

L+R is also the same as M (mid signal), so what mono summing basically does is to strip the S (side) channel content. That’s not fair, I think :slight_smile: There could be a lot of information in S in regards to frequency and character that we would want to preserve in mono. I believe there ought to be a better way, where the “message” and “meaning” (our auditory interpretation) of the signal is preserved better.

My conclusion is that we have to go into time domain and make the stereo content in phase before summing, by delaying some part of the signal. There is no other direct mathematical way to make two out of phase signals in phase, AFAIK. What I propose:

  1. Use a phase auto-align tool between what is unique for L and what is unique for R. There are some auto-align tools on the market, mainly targeted at those who record with multiple mics.

  2. Let the result be new_S

  3. Sum M + new_S (possibly phase auto-align here too)

This should generate a mono signal where S is not completely cancelled out.

What do you guys think?

I’d be interested to hear the results of it, especially to see if you can get it to work without some form of comb filtering going on.

It’s been a while since I was studying DSP, but first thought that comes to mind… Whenever you calculate FFT of a signal, namely you want to see what frequencies are inside, you get two things: amplitude spectrum (what we usually see on spectrograms) and phase spectrum. Most people don’t understand the phase spectrum, that is why we look at amplitude spectrum, but it does not change the fact that those two pieces of information are inseparable.

If you think about what FFT is, this becomes clearer. You want to represent audio signal with a set of sine waves of varying frequencies and amplitudes. It is not possible to construct any possible value with sine wave of any frequency but always starting at the same angle. That is why if you want to use a sine wave as a building block to recreate original signal, you not only need to alter it’s amplitude, but also choose the angle that will fit your purpose.

If you start playing with that angle, you will get different value and the resulting output will also be different than original - meaning that you will not recreate original signal using sine waves, you will create something new.

This leads to a question - do you want to extract only this piece of the original signal that is identical in both channels (which L + R does), which means that you don’t change the original but only take a part of it, or you want to create something new based on the original. In my understanding, mono by definition means that all channels have the same values - so keeping only common parts of both channels seems perfectly reasonable to me.

I am reluctant to introduce something completely new in a simple process of stereo to mono, because it will not be easy to understand what the new signal is. Besides, if you lose something, you lose it. You can’t have a cake and eat a cake. If you go mono, you will lose content - spatial information.

I think I understand what you are saying - let’s say you have an interesting sound that is 100% R and it’s phase reversed version is 100% L. By going mono you will lose it completely. But if we try to align it’s phase to keep it, we will change it completely. Not to mention that that operation will change all other sounds in the same wave.

Just my 2 cents…

I’m with you, carmazine (except I haven’t wrapped my head around FFT yet)

It’s just an idea, that might (or might not) be somewhat useful. Results will probably be a bit unpredictable or non-consistent at the very least. My thought is that if it would work at all, it would potentially be more useful only on single instruments (tracks) and not so much on buses or mixdowns.

I’ll have a go using Mxxx from Melda (modular matrix vst) to see if anything useful can be done. Another idea: Instead of phase aligning L-R with R-L, perhaps a cheap routine could be done by passing S thru a subtle effect (short reverb or delay) before summing S+M to mono. Some of the character might be preserved as mono-compatible without altering the sound too much.

Gotcha, I thought more about mixdowns, you got me there :slight_smile:

I like the idea, I am just unsure about making it come true. I am more than interested in seeing how your approach works - good luck!

A cool way I know no good free quality plugin for is to really seperate mid and side. I.e. into “real-sides” and “center”. Center being only what is in both channels, and “real-sides” being a stereo signal or the original minus center (so only what is going on in the stereo field). This could be lots of more useful for sound sculpting than the traditional (crooked) mid/side stuff, as you could happily run both signals through standard effects and sum up without the real bad side effects of the traditional way. I believe one way is to make ffts and define the results as common or difference of both stereo channel’s information, alas this will generate quite some latency when you try to make it in high quality in realtime I think?

Center being only what is in both channels, and “real-sides” being a stereo signal or the original minus center (so only what is going on in the stereo field).

Interesting, but I don’t quite get the difference from M/S. Would you care to elaborate? Do you mean processing “S” as an actual stereo track, instead of a mono signal?

Yes, “S” would be a stereo signal, free of any stuff that would be present on both channels. Likewise “center” would only contain information that is shared by both left and right channel, but no information that is unique to each of them. Basically stereo gets expanded into 3 channels, left, right and center. There actually is an opensource plugin (stereo source seperator), but it sucks because of very bad audio quality. Because some frequencies are present in both channels at different amplitudes, once identified there can be choices on how to treat/seperate information, i.e. choose some sort of degree of seperation like in the mentioned plugin, or blend with the side signal in some manner instead of hard cutting.

The technique is called “center channel extraction”. I think it might be somewhat patented or whatever, and maybe not feasible in realtime because of fft sizes needed for good quality.

Just wanted to mention this because it might be relevant to the original post. While I still think your idea of making 2 signals (stereo or arbitrary) combinable in a way with minimum phase cancellations sounds interesting, it for sure is yet another leverage above center extraction, in complexity similliar to above real good quality pitch shifters.

Sounds like M/S to me… ? nevermind :slight_smile:


Muting the M channel from a stereo track is not that difficult. It can be made with Mxxx from Melda for example. I think there might even be a not-so-obvious way of doing it in Renoise if you really put your mind to it (but I’m not sure). Yeah… could perhaps be possible with some dual panning and phase inversions back and forward.

Anyway, here is a first example of summing a stereo track with lots of phase issues. Just a first iteration of the principle I’ve suggested:



Feel free to compare this summing to summing the source normally on your own and let me know what you think.

No, what I mean is a tad different…more complex.

I know the traditional m/s tools. I’ve just tested “voxeng msed”, and probably it is similliar to the melda plugin, it also sports “side mute” and “mid mute” buttons. But this is very simple stuff, you can do this with renoise, with gainers, stereo expanders and some send channels, too, I’ve even once tried applying the tradition m/s stuff this way. The “mid” channel will contain “l+r” and also sidepanned stuff from both sides, and the “side” channel will be “l-r” or “r-l”, and as a stereo channel (I think this will make each channel contain the inverse of the other) sound very weird and twisted with headphones (expecially bass), chronically out of phase so to say. A bit like the renoise stereo tools with maximum width which will sound odd with bass and headphones, too, but more extreme.

The thing is these stereo tools will put the intermediate signals out of phase. They can be reconstructed back to the original, but once you manipulate them in strong ways, the final result might be partially or completely out of phase, similliar to the “side” channel as percieved in stereo mode by the plugins mentioned above. This is because both components need to be there to make the original, if you cancel things out of one part but not the other the reconstruction will overshoot and put stuff together in a wrong way.

The center channel extraction stuff on the other hand won’t simply substract the channels from each other, but correllate frequencies, trying to find shared and individual components. This will enable to make up a channel that will contain only stuff from the middle of the stereo field (and not like in the m/s way with the sides just added up) and also pure l or r side signals, i.e. a stereo signal that will contain only action from panned to the left or right sounds, or wide reverb tails but then only the wide stuff, and not the centered components. So different to m/s. Spooky, isn’t it? But I believe this could be a real nice way to craft stereo signals, and drive for example side eqing or the like to new dimensions. Put back together there won’t be artefacts, so you can drive any effects you like on mid and side, be them destructive as you wish, no phase issues in the final result. Even the l/r side signal might be used directly without phasing, given the original source was unproblematic.

As for phase issues with tracks, I’m already used to placing a mono-mixer on the master channel. Right next to that hp filter to see how it will sound with weak bass speakers. For every stereo action I now and then check up a mono mix to see if it destroyed something badly, or if the music is still recognisable and pleasant. It is a lot of trial and error. I think this stereo out of phase stuff is also relevant not only for mono speakers, but also for arbitrarily placed stereo speakers. Maybe if mono will cancel too much, in stereo speakers might fight each other, too, and make weird effects depending on the listener’s position. Like it is maybe a bad idea to put width on sub/bass frequencies, they better be either mono or simple panned.

Huh, yeah, and the example does sound like a m/s style width button has been used. I hate this - I alway try to use other effects for stereo width. You can feel the pain when using headphones, I literally get sick from the asymetric out of phaseness of bass/mid frequencies. You did some good. I could achieve similliar effect by using the msed, splitting the signal with it, and applying a tiny delay to one channel of the side signal, then summing back together. The out of phase instrument in the background could be heard clearly again in the mono sum, just like in your summed example. But there was some very slight comb effect that was apparent for me with my own trick when I a/b ed my chains. Couldn’t judge whether you did the same trick, your wav also was a bit louder than the original or my attempt on it.

Very interesting Oops!

I knew some people get “sick” from out-of-phase content, and I wish I was one of them. I actually have a hard time detecting it confidently with my ears and without using the aid of visualizers, or using a mono button to check if things get cancelled.

We seem to be kind of on the same page here since you understood exactly what I tried in the experiment :slight_smile: I think i used a 50% wet delay that was very short, but some comb filtering might have occured. However it could be an acceptable compromise considering you get rid of some phase issues. I believe this is an effect that is potentially best used per-track if a particular instrument has phase issues. Quite a few VST instruments suffer from this… Then you can also tweak the delay to what works best with that instrument. My initial goal was to make a processor to put on the master bus and make it have better mono compatibility in one go, but this might not be advisable due to effects on transients and comb filtering as you say. However, I like using the side(L) delay trick on individual tracks. It kind of puts the tracks in the same kind of “stereo width context” and makes the stereo field coherent between tracks in some weird way. I suspect it makes phase “angles” (?) more coherent between tracks, and avoids having information subtly flying all over the place, stereo wise. Definitely usable if you do wall-of-sound mixing.

I’m still not quite getting your way of doing an M/S like processing. Do you use some kind of boolean operation instead of simple addition/subtraction? It sounds interesting but I couldn’t wrap my head around how it differs from normal M/S extraction.