Rate my resampling algorithm!

Hi everyone,

I’m Mark, casual Renoise user and one-time mod musician in the mid to late 1990’s (I went by the handle Arcturus - I was, at best, a footnote in the demoscene, and I’d be surprised if anybody remembered me - also there may have been more than one Arcturus). I’m now a professional software developer and I was feeling nostalgic, so I decided to try writing a mod player. Specifically, a player for 4-channel ProTracker mods - mostly just to see if I could do it. I decided to implement it in Kotlin running on the JVM, since that’s what I use a lot in my day job.

My goal is to get the player to play Space Debris correctly.

As you may be aware, the resampling algorithm is at the heart of any mod player. I was curious if my implementation is any good - it sounds fine so far, but I’m sure there’s probably a better way to do it. I’m posting this here to solicit feedback: What do you think of my implementation? Is there a better way to do it? Are there any obvious pitfalls? I don’t have a background in audio programming, but I know enough to get this far.

Anyway, here’s the repo: GitHub - mabersold/kotlin-protracker-demo: A JVM-based kotlin application that loads an Amiga Prot

Direct link to the resampler code: kotlin-protracker-demo/ChannelAudioGenerator.kt at main · mabersold/kotlin-protracker-demo · GitHub

Summary of the code organization:

model package contains the data classes to hold information about the song.
player package contains the class that actually sends the audio to the output device.
pcm package contains the classes that convert the song into a pcm audio stream. This is where the resampler lives.

The basic idea is that there is a single AudioGenerator class that keeps track of the position in the song. It has four ChannelAudioGenerator classes - one for each channel - that produces the audio in that channel. As each row passes, the main AudioGenerator sends data to the ChannelAudioGenerators (note commands and non-resampled audio data, basically) and the ChannelAudioGenerators send pcm data back to the main AudioGenerator, which mixes them and sends them back to the main class (which sends the data to the output device).

Summary of my algorithm:

The ChannelAudioGenerator has the following information: the period (basically the pitch that we need to resample to), an instrument number, and the audio data of the instrument.

The basic idea behind the algorithm is to do linear interpolation between each value of the original audio file to get the correct pitch. I’m operating under the assumption that I will not need to increase the frequency beyond what the original audio files already have (which seems to hold true so far from my testing) - I’ll only be reducing the frequency, and thus needing to interpolate data between the values of the existing audio data.

Calculate samples per second

It performs the following calculations. First, it takes the given period and calculates samples per second with the following formula:

samplesPerSecond = 7093789.2 / (period * 2)

7093789.2 is the PAL clock rate, in case you were curious. The period comes directly from the pattern data in the Protracker mod. For example, 428 represents a C-2 note, so that would be calculated at 8287.1369 samples per second.

Find out how many bytes we need to interpolate

So now I have samplesPerSecond. Next, I need to calculate how many bytes I need to interpolate. To do this, I basically find out how many times samplesPerSecond fits into our sampling rate of 44100. The functions for this are as follows:

    private fun getIterationsUntilNextSample(samplesPerSecond: Double, counter: Double): Int {
        val iterationsPerSample = floor(SAMPLING_RATE / samplesPerSecond).toInt()
        val maximumCounterValueForExtraIteration = (SAMPLING_RATE - (samplesPerSecond * iterationsPerSample))

        return iterationsPerSample + additionalIteration(counter, maximumCounterValueForExtraIteration)

    private fun additionalIteration(counter: Double, maximumCounterValueForExtraIteration: Double): Int =
        if (counter < maximumCounterValueForExtraIteration) 1 else 0

“counter” is a double, starting at 0.0, that we continually add the samplesPerSecond to as we resample. When it exceeds 44100, we subtract 44100 from it, and then switch to the next pair of bytes in the original audio data to interpolate. So, the number of times we can multiply that counter until it exceeds 44100 is the number of bytes we need to interpolate between the first and the second byte (well, including the first byte, which we technically don’t interpolate).

For a C-2, if we start at the first two bytes of audio data from the instrument, these functions would conclude that there are six “iterations” before we switch to the next pair of bytes. So, we would to start with the first byte value, and then interpolate five times before we reach the second byte. The number of these iterations will not be uniform across the audio data: sometimes it will be five, sometimes six for a C-2.

So if, for example, the first two bytes were 6 and 18, we would need to interpolate five bytes in between those two values. It knows this because if we multiply 8287.1369 by five, it’s still under our sampling rate - we would need to multiply it by six before we exceed it.

Doing the actual interpolation

Finally, now that we know how many interpolations we need, we calculate the difference between the two bytes, calculate a slop, and then interpolate the bytes between them using a simple linear function. In practice it ends up looking like this:

(slope * currentIteration) + firstByte

So going back to the example of bytes with values 6 and 18, it would calculate a slope of 2, so we would end up with interpolated values of 6, 8, 10, 12, 14, 16 before the counter exceeds 441000 and then we move on to the next pair of bytes (18 and whatever’s after it).

A few notes:

  • To reduce confusion, I only use the word “sample” to refer to the individual bytes in a PCM audio stream or collection. I do not use “sample” to refer to the instruments, for that I either use “instrument” or “audio data.”
  • Effects aren’t implemented yet.
  • For now, I’ve kept all the audio data at 8-bit, which is why I’m just dealing with bytes and not increasing them to words/shorts
  • Part of my goal with this project is to make the code readable without too many bitwise operations, though some of that is unavoidable to some degree - I’m not going for performance
  • For now I’m just retrieving one byte at a time, but I may eventually switch to retrieving collections of bytes to reduce calculations.