If you can not upload the song, let my try to explain how the multicore stuff works in Renoise, so you get a better picture of whats happening - what works and what not. Its not that you can simply say that you get a 4x speedup with 4 cores against one core - with no audio application:
Its enough that only one core is at 100% to make Renoises CPU usage 100%. Renoise splits down the tracks that are currently running in your example into “4 units”, which then get assigned to one CPU each. If now one of these units reaches its limit (is above 100%), but the other cores/units are already done with their processing, you end up in simply waiting for the one core that overloads.
This means, as soon as you play back for example one synth that uses more than one core can handle (this one synth plus its tracks its playing on), you end up in 100% audio usage.
We can not “spread” one single instrument to all 4 cores, we only can split up tracks. Only the VSTi itself could do this (Kontakt for example will do).
So does one of the Korg M1/Wavestation instrument produce this overload, even when playing it back alone (without the other instruments)?
Also see our MultiCore FAQ please.