All of a sudden, it just occurred to me, how weird it is, that a computer stores sound. I don’t know why, but I was looking at a flac sample and just about to drop it into a Renoise Instrument, and I thought, “how does this work?” What is in the flac file?
Cause what is shown to us… (guys who do not understand the mechanics of this) is a visual representation of the audio… but obviously, the computer does not see that… what does the computer see? How does this work? How can a computer record a sound, hold on to a sound, and play back the sound as it was?
What does the computer see? What is a wav file made out of? Numbers? Some type of computer code?
a wav file starts with some bytes (called “header”) telling to the computer which kind of WAV file it is (how many bits in depth, how fast is the sampling frequency, how many channels, and so on).
by reading these bytes, the computer knows how to read the following ones: if a file is identified as a stereo 16 bit 44khz file, for example, the computer will read the following bytes of the file in groups of 16 bits, alternating left and right channels, and will play them back at 44100 times per second rate.
compressed files (such as FLAC, MP3, OGG, …) also require a decodification of the signal, which is not written “as it is” like is hapening in the WAV files, where basically you have a sequence of numbers which identify the amplitude of the wave at a certain point of time: a silent wave file will be filled of zeroes, a full-volume 16 bit square wave will be filled of (2^16)/2-1 and -(2^16)/2 sequences, and so on.
by opening a WAV file into a text editor, you won’t see the actual numbers anyway: each number will be shown as a sequence of (probably unreadable) characters which is the textual equivalent of the so called “binary representation” of the number
As you can see from the image above, there are 16 steps (0-15) into which an amplitude can be placed. This means the audio has a bit depth of 4 bits (2^4 = 16). As a comparison, CD audio is 16 bit (2^16 = 65536 steps) and DVD audio is 24 bit (2^24 = 16777216 steps). The higher the bit depth, the closer to the real value the amplitude can be. Think of bitdepth as resolution in the vertical axis.
The horizontal axis is the sample rate. This is the number of samples taken per unit of time. For example, let’s assume that the sine wave is at a pitch of A (440hz). This means that the entire cycle of the waveform takes (1/440 = ~2.273ms) to complete. In the example image, there are 32 steps across this cycle, therefore each step is (1/440) / 32 = ~0.710ms, which is a frequency (sample rate) of ~14000Hz, which is 14kHz. CDs have a sample rate of 44100Hz and DVDs have a rate of 48000Hz. Think of sample rate as resolution in the horizontal axis. Because of the Nyquist frequency, to accurately reproduce a piece of audio at frequency of X, you need to sample it at a frequency of 2X. Any less than this will cause aliasing.
This wave data is stored in the audio file as a header explaining the sample rate (e.g. 44.1kHz) and bitdepth (e.g. 16 bit), followed by a list of all the positions (samples) for the recording. It is common for the samples to be stored as signed numbers, but unsigned numbers are also possible. The endianness of the number (order of digits) can also be bit or little endian.
Compressed audio (flac, mp3, ogg vorbis) etc are essentially the sample data stored in a compressed way. Flac is just a simple compression of the list of sample (think zip compressing a wav file), which results in no data being thrown away to save space (lossless compression), while MP3 format discards data which meets certain criteria to save space (lossy compression).
Hope this helps as an overview and I can go into further detail if you wish.