Sample Vs Frequency Packets.

Please vote after reading

Not sure exactly what you mean, but it sound a lot like time stretching?

i don’t follow either. please elaborate, or try it with some examples.

Well…, instead of direct sampling at fixed sample rate, - frequency spectrum analyzing, and finally leasing ought be performed. Because human’s ear hears sound by using over 40 000 sensors, each responding to its own frequency, (at least I know it from somewhere, if I’m not mistaken).

Then sample will consist of frequency packets and amplitudes, both represented by precise digital arithmetics.

Thus frequency can be changed with no harm to play-time of sample.

Kind of. But more native.

as far as I know, the human ear (and not only human one) works more as an amplitude modulator, since each of its parts are pressure sensors, rather than frequency sensors.

your idea, although is quite vague, may work, but would invalidate years of development of audio processing based on fixed frequency sampling, also requiring much more processing power and more complex structures. I’m not saying that it is necessarily worse than the actual way, but you should elaborate it more

I highly doubt this would be true

May be…
You know better.

I told it before! Sometimes reaction can be like this:

(I’m just kidding :D )

And it is followed by another idea:
For natural sounding of sample it should be represent as compresssed hologram.
The meaning is: All notes (or even smaller frequency shifts) are played and analyzed, then compressed by algorithm which rids of unnecessary redundancy. According to required frequency hologram will give corresponding frequency packets.
And specific sounds of sample, like noise from touching or scratching strings or beating at them will be transposed accordingly to hologram. So it will be reproduced correctly - no matter what note you play.

Idea is here. It can not be stopped :)

I don’t know how many individual sensors/hairs they estimate there to be in the human ear but they are grouped into approximately 30-35 tone sensing areas, hence the introduction of 31 band EQs, supposedly meant to match up roughly with the frequency and spacing. (You may think it quite amazing we can pick up such subtle differences in pitch with so few receptor groups but then think about the fact the average male only has three different colour sensors in the eye for perfect 20:20 colour vision.)

What you are talking about forms the basis of most audio compression. Mp3, ogg etc. Most probably Flac too (before other processing), which Renoise uses. There is an uncompressed/lossless format that definitely uses it but it’s name escapes me.

As far as I know nobody has tried doing processing on the data when it’s in this form, does sound like it should be possible and would quite possibly help with stopping artefacts such as aliasing when changing pitch of played sample (and quite possibly make time stretching easier??)

Interesting idea but a bit much for the small dev team of this wonder software to concentrate their time and energy on!

not only that, but I think that it can be said that it is at the base of the whole DSP theory: most of effects act in the frequency domain to which the sample data is converted using Fast Fourier Transform. That’s why I am telling that his idea should be elaborated more: either I am missing the basic point of the “frequency packet” or he is telling to do what FFT already does

this topic is definitely out of my league. which explains why i didn’t understand. you boys have fun, i’m out. :slight_smile:

Doh! Yeah of course. One of a fair few methods mind. IIRC MP3 uses DCT (same as they use for the video in MPEG (1, 2 and basic 4.)

I have only idea, technical details you know better than I do.

something like that.

Actually human’s eye has over 300 sensor systems.
For boundary between red and blue, blue and yellow, green and blue, (and so on) it has specialized sensors. To sense increasing of brightness of each colour - used special type of sensor, - for decreasing as well. There are also sensors for angles, vertical or horizontal lines, movement, brightness, and more.

It seems close to truth.

This is exactly what I mean.

Evidence? From all my learning on eyes and vision from studying television broadcasting and related subject there are two main categories of sensors, namely the Rods and the Cones.

Rods are the outer sensors, do your peripheral vision, are only black and white (monochrome) but are much better at sensing changes/movement. It is actually a lot easier to see when something moves if you’re not looking directly at it, as anybody who has done any hunting will tell you. They are also a lot more sensitive, which is why you loose colour vision at low light levels.

Cones are the centre sensors and give detail and colour. Classically we are taught there are three type, Red Blue and Green. This is true for the majority of human males and it is the lack of one of these that causes colour blindness, most common version confusing Red and Green as one of those two are missing. It is rare for females to have less than four types of Cones, hence why females are carriers of the colour blindness gene but you don’t not actually find very many women who are colour blind themselves.

Any text book you can find on the subject will agree with these facts (although it may ignore that people do often have more than three types of Cones operating at different wavelengths.)

The human ear & sound processing system (brain) work as both amplitude and frequency sensors. The eardrum and bones of the inner ear amplify sound mechanically. In the cochlea certain hair cells vibrate at their own fundamental frequency triggering neurons to fire (also at the frequency the hair cell vibrates at!). Thus we resolve frequency at the physical and at the higher processing level, in-fact these nerve impulses still occur at the frequency of the sound deep inside the brain where they are processed. You can play a sound in one ear, and place electrodes on the sound processing centres and if you amplify that electrical signal you hear the same sound back!

This idea does have some potential. What length would a frequency packet be in time for it to be imperceptible? If we can already process 32 bit floating point numbers at 96khz. Maybe a spectrum of frequencies at 1khz would be possible but very computationally expensive. Remember that if this is on a computer it still has to be converted to audio and doing that would either require going through the sampling process as we know it or building new hardware to do it.

True but remember of course that our experience of “perfect” colour vision and pitch perception are more expression of our own neurological makeup and physiology than they are real world phenmomena. Colours don’t exist in the real world and if they did there would be far more of them than we can actually see. Our peception of amplitute is pseudo-logarithmic. And our perception of pitch is actually quite poor. A good example of our pitch perception inadequacy is heard in FM synthesis where a reasonably low modulation frequency becomes imperceptible as a vibrato (which it actually is) and sounds like a pitch or timbre / overtone series.

It seems we’ve been reading different books, man… :)

That is what I’m talking about.

Signal goes to the brain being already FFT-like encoded.
So actually what you see does not exist the way you see it.
What you see is hologram memory of your brain brought up every time you perceive visual information…
World is strange, you know… :blink:

Just as it has been told. :)

Velocity sensors (think ribbon mic rather than diaphragm) tuned to a particular resonant frequency.

Of course they do. As much as a sound wave and frequency do anyway. And there is much more than we can see, from heat to radio, tv, microwaves x-ray and radiation. All a different frequency of the same thing. The world would be a bit messy if our eyes could pick it all up though, so are targetted at the “visible spectrum.” There are plenty of animals out there that do have infra-red (heat) vision though.

4, 5 maybe 6. Not the 3000+ you were talking about!

It was about eye vision. - It is not that obvious and simple as written in school books. Recent researches gave to us many new information. I’m just trying to rely on most recent data.

I think you must have the human eye confused with a compound eye, in which case the answer to this question:

is most definitely yes!

Maybe this link will interest you :

It’s a sofsynth based on the approach you suggested.

I’ve tried it and found it kinda tricky to get good results, but may have missed something…

But, basically speaking, modern pitch shifting algorithms does extract the frequencies of a sound and shift them to change the perceived tone, and it should lead to almost the same results.

300 sensor systems… I can’t say I’m aware of all of those, but I do know that there’s a whole lot of processing going on before what you see reaches the visual cortex.

In a loose sense, the entire visual cortex could be considered a large set of sensor systems, given that’s where you sense orientation (‘bar’, ‘grating’, ‘end-stopped’ a.k.a. corners), motion (direction-sensitive cells) and other higher-order stimuli. A stroke in the right place could mean something as specific as corners of things would be less attention-grabbing, or even altogether imperceptible to you. I once smoked salvia divinorum and I am pretty sure it did exactly this; the edges of things seemed to ‘run off past the corners’ – combined with the odd sense of somebody else controlling my movement, I naturally concluded that these were definitely force fields and I had to carefully step over each and every one of them :) …Anyway

So even before visual input comes near the brain, all this happens, and probably more:

  • In the retina, rods, cones and photosensitive ganglion cells directly take visual input - the ganglion cells are much deeper in the retina than the others - Cells in the retina perform edge-detection to compress the visual information enough to travel the optic nerve; basically, a high-pass/sharpen filter made of neurons - ‘Retinal ganglion cells’ (all 5+ different classes of them) send results on its merry way into the brain, some types extending all way way through the optic nerve
    Types include: - Midget cells sense color changes well, and also sense great contrast changes (but not minor ones so much) - Parasol cells sense minor contrast changes well, but not color changes - Bi-stratified cells are known to sense moderate contrast changes, and are only affected by inputs from blue cones

Below: the insane amount of cells which process your vision before it gets to your brain. Direct input starts at the bottom of the pic, and moves up.

I should think the auditory system would be a lot simpler. I know there’s some sort of high-pass filter that works on a per-frequency basis, i.e. it effectively ‘tunes out’ unchanging, continuous sounds. I should expect the auditory cortex picks up on sudden stops and starts, since those are fairly attention-grabbing things. Probably some more cells for rising and falling tones, helping speech recognition, supplementing general sound recognition, fuelling the ‘infinite’ perception of the Shepard tone etc.

Basically, the rule of perception seems to be, think of every ‘event’ you recognise and there’s probably a cell or set thereof to ‘sense’ it. I guess a great portion of the entire brain is a some sensor or another.

EDIT: Back on topic – my ‘naive’ algorithm for baby’s-butt-smooth sound stretching, preserving pitch. Feel free to pilfer as desired, no patents here (that I am aware of).

Following on from basic FFT theory, sound could be considered to be made of sine-wave ‘grains’ of all different frequencies, amplitudes and offsets in time (roughly, I would guess that phase = grain start time MOD period).

  • A grain is a single period of a sine-wave; an ‘atom’ of sound
  • Grains/sec is low for low frequencies => little storage requirement
  • Grains/sec for treble is high => greater storage requirement

We may need some sort of timing information for every grain if we would like to preserve phase information. For now, that’s a bit over my head so I will politely forget about it :)

Anyway, into the meat of the pitch-shift. Let’s say we take the ‘grid’ approach, where you have a sort of grid, but every row is chopped-up finer and finer as you go higher frequency. Each cell represents a grain, represented by its amplitude only (frequency is inferred from the row number, timing is inferred from the column number).

  • A 2x speed-up would be trivial: remove every second (time-domain) cell, making the entire grid half-length.
  • A naive but decent 2x slow-down would simply involve playing every grain twice

Finer pitch changes would entail removing/repeating every Nth grain in each frequency, at its simplest (i.e. nearest-neighbour sampling). As you can imagine, we could apply any of the usual interpolations to recover any pitch we like.

:walkman: And for pitch-shifting, just stretch the sound as above, then re-pitch in the usual Renoise way in the opposite direction, e.g. To double the pitch: double the length using the above method, then play an octave higher.