Audiophiles Just Got Pwnd

The Emperor’s New Sampling Rate

http://mixonline.com/recording/mixing/audio_emperors_new_sampling/

The arguments about sampling rates and word lengths in digital audio are long over with, aren’t they? I mean, no less a personage than James A. “Andy” Moorer — former director of Stanford’s CCRMA, co-founder of Sonic Solutions, recipient of a Lifetime Achievement Award from the AES and now senior scientist at Adobe — wrote the following in an unpublished (but oft-quoted) paper a dozen years ago: “Let us start with observations that are largely beyond question. These observations are not a subject of debate, but they beg further discussion: Ninety-six-kHz audio universally sounds better than 48- or 44.1kHz audio” (his emphasis). The great unwashed consumer base hasn’t caught on to this because we’re still waiting for that new medium to come along that will prove it to them and begin a long overdue renaissance in high-end audio, right?

Well, SACD and DVD-A have been on the scene for some time, but haven’t made much of a splash in the consumer market. Direct Stream Digital (DSD) is being used quite a bit as a recording format in high-end classical and jazz circles; Telarc’s doing everything in DSD these days. However, the problems of editing, processing and mixing recordings in DSD have never been solved well enough for the format to be adopted by the pop music world. Yet no matter how good they sound at the mastering level, the truth remains: The vast majority of DSD recordings are still delivered to the public on ordinary CDs.

According to a remarkable new study, however, the failure of new audio formats — at least the ones that claim superiority thanks to higher sample rates — to succeed commercially may in reality be meaningless. The study basically says that (with apologies to Firesign Theatre) everything you, I, Moorer and everyone else know about how much better high-sample-rate audio sounds is wrong.

The study was published in this past September’s Journal of the Audio Engineering Society under the title “Audibility of a CD-Standard A/D/A Loop Inserted Into High-Resolution Audio Playback.” The study blew me away for a number of reasons. One is that it was almost identical to a study I proposed some years ago at the school where I was teaching, but it never got past the proposal stage. Second, the two authors of the study, David Moran and Brad Meyer, happen to be people whom I’ve known for several decades (we were all part of the crew covering audio and other technologies at The Boston Phoenix when I was starting out as a writer), but I had little idea what they were up to these days.

The main reason it knocked the wind out of me was its conclusions. It was designed to show whether real people, with good ears, can hear any differences between “high-resolution” audio and the 44.1kHz/16-bit CD standard. And the answer Moran and Meyer came up with, after hundreds of trials with dozens of subjects using four different top-tier systems playing a wide variety of music, is, “No, they can’t.”

THE TRIAL
The experiment was wonderfully simple: The authors set up a double-blind comparison system in which one position played high-end SACDs and DVD-As through state-of-the-art preamps, power amps and speakers. At the other position, the output from the SACD player was first passed through the AD/DA converters of an HHB CD recorder and then through the same signal chain. The levels of the two sides were matched to within 0.1 dB, with the amplifier doing the matching in series with the CD recorder so no one could claim that it degraded the SACD signal. The test subjects used an “A/B/X” comparator to switch the signals, meaning that in some of the tests, when the subjects hit the Change button they didn’t know if the signal actually changed.

There were 60 subjects, almost all of whom were people who know how to listen to recorded music: recording professionals, nonprofessional audiophiles and college students in a well-regarded recording program. In all, there were 554 trials during a period of a year. The experiment was done on four different systems, all employing high-end components and all in very quiet rooms designed for listening in both private homes and pro facilities. All subjects were given brief hearing tests to determine their response to signals above 15 kHz. That data, as well as the subject’s gender and professional experience, was tabulated with the results.

MAY I HAVE THE ENVELOPE, PLEASE?
The number of times out of 554 that the listeners correctly identified which system was which was 276, or 49.82 percent — exactly the same thing that would have happened if they had based their responses on flipping a coin. Audiophiles and working engineers did slightly better, or 52.7-percent correct, while those who could hear above 15 kHz actually did worse, or 45.3 percent. Women, who were involved in less than 10 percent of the trials, did relatively poorly, getting just 37.5-percent right.

So how did the audio community respond to this? Meyer tells me that he got a lot of “thank you” and “it’s about time” responses. He also says that the article passed through the Journal’s rigorous review process without any argument. But some loud screams were heard from various members in the audio-tweak community, and a number of heated and sometimes nasty flame wars erupted on several audio forums within hours of the article’s release — many of them started by people who hadn’t bothered to read it first.

Most of the objections were based on the fact that the authors didn’t include in their paper the list of equipment and recordings that they used. Meyer explains that part of that reason was to keep the article from getting too long. But anyone familiar with the type of debate that often occurs in tweak circles knows that had the authors been specific about the components, they would have immediately been attacked on the basis that their equipment was, of course, inferior to what they should have used, and so, of course no one would hear any difference.

In fact, Meyer and Moran posted all the information about the signal chains and the source material within a couple of weeks of the article’s publication on the Website of the Boston Audio Society, a venerable 37-year-old, independent non-profit organization, in which both authors have long been active. The equipment list included amplifiers from high-end manufacturers like Adcom, Carver, Sim Audio and Stage Accompany, and speakers from Snell and Bag End, as well as the oft-worshipped Quad ESL-989 electrostatics, which are supposed to have usable response up to 23 kHz — which is, of course, above the Nyquist frequency of the HHB recorder’s converters. The subjects listened to discs that covered a wide range of material and included classical instrumental, choral, jazz, rock and pop, from audiophile labels like Mobile Fidelity, Telarc and Chesky.

So the objectors really didn’t have much to object to. But if you think about it, the exact equipment list is largely irrelevant. If you assume the equipment, the listening environment and the listeners’ critical faculties are all at least good, then what’s most amazing about their findings is that the results were always the same, no matter what equipment they used or who was listening to it or what they were listening to. Not one listener, under any circumstances, could consistently distinguish between high-resolution audio that was passed through the 44.1kHz/16-bit CD “bottleneck” and audio that wasn’t.

Does this mean that someone else couldn’t do a similar experiment and end up with different results? Not at all — and Meyer and Moran are urging others to do just that. After all, this is what the scientific method is all about: If your experiment comes up with a certain result, then by publishing it you are inviting the rest of the world to copy (or expand on) what you’ve done and to see if their results agree or disagree with yours. I would love to see this experiment duplicated often, and I would be delighted to see someone come up with different results.

WHAT’S GOIN’ ON?
But wait a minute — haven’t we all heard the superiority of high-sample-rate audio? Leaving the tweak-heads aside, there are a huge number of people in this field for whom I have real respect — Moorer among them — who have experienced high-sample-rate audio to sound more “spacious” or “detailed” or “enveloping.” You might even be one of them.

As it happens, I’m not, which is not to say I think everyone else is full of beans; I’ve just never experienced it in an environment that I feel was controlled enough for me to be comfortable making that kind of judgment. It’s not that I’m lazy: As Meyer and Moran realized, setting up a test that could really be considered objective is not trivial. Even if I were the sole subject of the test, I’d still want lots of time, multiple music sources, incontrovertibly great equipment, an excellent level-matching system and a very quiet (and consistent) room.

I have had one experience that came close to this, but the result was inconclusive. At the press roll-out a dozen years ago of DSD at Sony’s studios in New York, a group of audio writers got a demonstration of how the new system compared with a 20-bit PCM digital stream, as well as with a direct analog feed from a live band in the studio. I could hear some differences. Yet how to describe them — or whether I would hear them again in another time and place — I couldn’t tell you. I did, however, mention a preference at the session for the way instrument decays sounded in PCM, to which David Smith (R.I.P.) replied, “We’ve heard that from others. In fact, you’d be very flattered if you knew who else said that same thing.” What the significance of that was, I guess I’ll never know, but it didn’t seem to get in the way of DSD ending up with plenty of fans among the recording community.

But something is causing people to say they are hearing differences. If a double-blind test can’t confirm those differences, then what’s going on? For one possible reason, let’s go back to Moorer’s paper that I quoted earlier (called “New Audio Formats: A Time of Change and a Time of Opportunity,” which can be found on his Website, www.jamminpower.com). Later in the paper, Moorer noted that humans can distinguish time delays — when they involve the difference between their two ears — of 15 microseconds or less. Do the math, and you can see that while the sampling interval at 48 kHz is longer than 15 µs, the sampling interval at 96 kHz is shorter. Therefore, he says, we prefer higher sampling rates because “probably [my emphasis] some kind of time-domain resolution between the left- and right-ear signals is more accurately preserved at 96 kHz.” It’s an interesting starting point for a discussion, but to my knowledge it’s never gotten past that point — as a theory, it has never been expanded upon or tested. And judging from the results of Meyer and Moran’s experiment, it doesn’t seem to be a factor.

Some folks think it’s all simply wishful thinking on everybody’s part: The system costs more and has better specs; therefore, we make ourselves believe it sounds better. There’s something to that reasoning. Humans are a notoriously imperfect lot and tend to see and hear what we want to hear. Another very plausible reason is something that the authors discovered in their research. Despite the fact that no one could hear the difference in playback systems, they reported that “virtually all of the SACD and DVD-A recordings sounded better than most CDs — sometimes much better.” As it wasn’t the technology itself that was responsible for this, what was? The authors’ conclusion is because they are simply engineered better. Because high-end recordings are a niche market, “Engineers and producers are being given the freedom to produce recordings that sound as good as they can make them, without having to compress or equalize the signal to suit lesser systems and casual listening conditions. These recordings seem to have been made with great care and manifest affection by engineers trying to please themselves and their peers.”

WAIT, THERE’S MORE!
But there’s one more reason worth examining, among whose proponents is Ethan Winer — a musician, engineer, studio owner, manufacturer and iconoclast who’s been in the recording business for some 40 years — who is definitely of the “show-me” school of audio theory and is an outspoken critic of “subjectivism” — that school of thought that encourages people to discuss the performance of audio components and systems using vaguely definable and often irrelevant adjectives instead of hard data. Winer’s company, RealTraps, manufactures modestly priced acoustic treatment products for studios, so it’s not surprising that he contends that anomalies caused by the listening space and our place in it far outweigh any possible subtleties we might be picking up when we change sample rates.

In an article on his Website (www.ethanwiner.com), Winer points out that in a typical room, moving one’s head or listening position as little as four inches can result in huge changes in the frequency-response curves one is hearing. What could be a 10dB dip in one spot at one frequency could be a 6dB boost a couple of inches away. These wide variations are caused primarily by comb-filtering effects from the speakers and from the various reflections bouncing around the room, which are present no matter how well the room is acoustically treated. Winer blames this phenomenon for most of the unquantifiable differences people report hearing when they are testing high-end gear.

He writes, “I am convinced that comb filtering is at the root of people reporting a change in the sound of cables and electronics, even when no significant change is likely. If someone listens to their system using one pair of cables, then gets up and switches cables and sits down again, the frequency response heard is sure to be very different because it’s impossible to sit down again in exactly the same place. So the sound really did change, but probably not because the cables sound different!”

The test subjects in the Meyer/Moran experiment didn’t get up and move around, and so the fact that they couldn’t discern any differences in the two signal paths fits nicely into Winer’s theory. In fact, his response when I sent him the article was, “Nothing in here surprises me.”

Am I sure that Winer is right? No, although I think he’s onto something, the way I think Moorer’s thoughts about microscopic phase differences may be important in some way we haven’t yet figured out. But I am delighted to read Meyer and Moran’s paper for two reasons: It confirms something I’ve long suspected and it throws down the gauntlet for further research to be done.

Well what do you want me to say?

Like the loudness war, any concerns about audio fidelity have been mooted lately since people’s modes of listening have changed.

If everybody’s going to be listening to my stuff on iPod headphones using ReplayGain, I guess I don’t really give a shit about that last 1% of audio fidelity.

reminds me a bit of this:

As any honest violin dealer will tell you (and there are a few), the sound of a violin can be priced in a range from $50 (bad but playable), to $5,000 (good-sounding) to $50,000 (extremely good tone and projection) to $100,000 (over-priced). The rest is snotty-nosed hubris. As has been proven on a number of occasions, most notably by the BBC in 1974, a well-made, top modern violin can sound just as good if not better than the prized golden age models. In a recording studio, behind a screen, the violins of Isaac Stern, Charles Beare, and Pinchas Zukerman were played back to them. The instruments played were a Strad, a Guarneri del Gesù, a Vuillaume, and a Ronald Praill (a modern instrument less than a year old). None of the esteemed violin experts had a clue which violin was which. Furthermore, most of them couldn’t even tell which was their own instrument. They were left mumbling platitudes about the personal relationship between fiddle and player - bloody obvious if you spend most years of your life playing the violin.

Word.

isn’t 24 bit recording really just a consideration for mixing/mastering purposes, ie more headroom?

did peeps actually believe that you could tell the difference between 16/24 recordings?

IMHO, just like with gfx, you can hardly have “too much” initial resolution. Not even thinking of fx and resampling, but while manually repairing a vocal sample I often find myself zoomed in so much I see the individual samples… in these cases “hearing the difference” is NO HELP AT ALL - it’s all about seeing it. At this zoom, I have 100-200 big fat blocks on my screen – having 10 or 100 times more resolution would make worlds of difference.

Sound is not only for hearing with your ear. It’s also for looking at it and putting it through algorithms, and these applications are much, much more sensitive to increase in resolution.

So, uhm, bleah. I never hear of these “audiophiles” unless they get “owned” – to me such articles are all just a conspiracy for the industry to save a few cents and rob me of pleasure :( :lol:

recording-wise there’s definitely a difference between 16/24, see http://www.tweakheadz.com/16_vs_24_bit_audio.htm.
But for samples I’m not so sure anymore - agreed.

Regarding KHz:
I think I can hear a difference between 44,1 and 48 KHz (made no blind test, but have a top signal chain and sensed more presence),
but no difference between 96 and 48 for sure. Anything > 96 is totally insane.
But we shouldn’t forget that “Emporers New Cloth” is not about recording rates, it’s about displaying rates !
Not to forget that e.g. for VST audio processing sometimes very high internal sample rates are used to e.g. avoid aliasing

And I agree: garbage in, garbage out, the most important thing about the discussion.
Besides that I think that the signal chain is also most important. A crappy MP3 player
with crappy ear plugs won’t display HQ audio in a good manner.

Last but not least:
anybody needs a 2,5 meter loudspeaker cable for 3000 € ;-D -> ? http://www.sommercable.de/excelsior/2/index.html
That’s really really deeply insane !

I posted this more for the people who claim they can hear a difference between the two Johann. Obviously there’s advantages to working with higher-quality-than-you-can-hear source material. Once you hit the mixdown though, you can easily sacrifice that extra resolution because it’s just not audible.

If you read the article, you’ll notice that professional sound producers and self proclaimed audiophiles couldn’t really tell the difference either.

I love how you presume this was aimed at you :P … STOP TAKIN THINGS SO PERSONALLY MMD xD

Well… yeah.

Threads like these are big invitations to heated technical debate, one which I just don’t have the energy for anymore. I don’t know what more to say really.

Do you have any reference to back up this statement? Using DSD as a recording format is something I have yet to hear of in practice. That’s of course because delta-sigma modulation and the likes are insanely hard to do processing with in the digital domain. Of course it might be that Telarc is doing all the processing before the recording in the analog domain or do no processing at all, but anyway.

That’s a very interesting article. I am kind of a born skeptic and I often teas my friends and sometimes professors about certain things they say they hear. However, I have once been to an interesting presentation by Piotr Nykiel, a sound engineering working in the Technical Academy in Warsaw, who is kind of audiophile extremist. Beeing a very talented engineer with huge knowledge of electronics and signal theory, he constructed his own set of audio-playback equipement. It consist of his own digital mastering set he programmed himself using a programmable DSP processor, his self-designed PCM D/A converters, self-designed tube amp and self-designed speakers. During the presentation we had a chance to test some things in practice, thanks to the capabilites of his virtual mastering rig. The presentation mostly revolved around things connected with jitter and downsampling, but what’s important is that on his equipement certain things I wouldn’t believe were clearly apparent. The most interesting thing was about jitter between left and right channel in a stereo setup. His set of converters actually used it’s own clock. The clock of the converters adjusted itself to the SPDIF input’s rate. The idea was that left and right channel each had it’s own separate clock, and separate jitter. This prevented any correlation of jitter between the channels which made a HUGE difference in the overall sound.

This is of course off-topic, but the presentation has proven to me how complicated the whole discussion is. After you go past the CD-Audio quality things require EXTREME precision in equipement and acoustic treatment of your room if you even want to hear any difference at all. And even then, things are never easy to prove as the details here are almost metaphysical. I actually wonder what would mr. Nykiel say about this study.

One of my professors, who gave lectures about PCM and delta-sigma A/D/A converters and digital audio format once described CD-Audio as a very well designed audio format. It’s huge and everlasting popularity is the best proof that it’s true. It’s designed just to cover the most usual capabilities of human hearing, and it’s been doing that very well, for 30 years. Let’s celebrate 44.1, 16-bit.

avoid jitter influences between left/right = huge differences ?
Why didn’t the speakers not just using the same clock ? Isn’t the approach to avoid jitter to sync digital gear by one and the same clock ?
To use two different clocks doesn’t make sense to me.

It’s been brought to my attention that some people may think this article is condoning using poor quality source material for production… and to that I say “hells no.” You should always strive to use the highest possible quality during production… and then distort the fuck out of it in post :P … it’s a proven fact that distortion sounds like shit if the input’s low quality ;)

It’s also a proven fact that distortion is the best compensation for lack of talent. :P

Ethan Winer put together an awesome and revealing documentary a while back which is on youtube, it’s a bit long but very interesting and I’d highly recommend anyone with even a general interest in audio watch it. The part wher he compares low and high end sound cards and the clincher, the null test, I found really interesting:

As much as I doubt anyone could tell the difference between say a final mixdown in 48 and one at 96, even so, I have to play devil’s advocate here. The methodology used in the experiment gives the results of the -group-, not the individuals. (remember the old adage… ‘all mice have tails’…) A proper experiment would have to test all of the participants individually, over a period of time and list individual results as well. I can’t imagine how difficult it would be to do an experiment like that, though!

Well thats because the real hifi won’t start unless you use 192khz.

Just kidding. I think a limit has been reached with 48khz which is enough to play most sounds plus people don’t really know the difference. Maybe if you did a 3-way-test with f.e. a real chrash-cymbal and then a 48/44.1khz and 96/192khz sample of the same crash-cymbal you might be able spot a difference. But probably not.

Since my name is mentioned above, I thought I might add my few cents worth: The statement I made “Ninety-six-kHz audio universally sounds better than 48- or 44.1kHz audio” was true at the time I made it, more than a decade ago. I was expecting to get a lot of static from publishing it. I did not. Not a peep. No angry letters from manufacturers begging me to listen to their converters. It was not the least bit controversial at the time. I’m glad that the state of the art of converter manufacture has improved to the point where there are some converters (not all) that perform very, very well. I would be terribly disappointed if there weren’t. In the article, I did not say there was any theoretical reason that 44.1 couldn’t sound as good as 96, but just that there were some very difficult engineering problems in an actual implementation. I tried to point out that there are a number of facets of human hearing that complicate the engineering. In fact, I claim that we still don’t understand spatial hearing very well. The current state of the art of 44.1 converters is quite a bit better than we had in the 90’s, and light-years better than the original converters that were available in 1984 when the CD first came out. I look forward to even more improvement in all aspects of audio delivery and enjoyment. I personally think that limiting ourselves to 2, or 5.1 or 7.1 channels is a travesty as well and can’t wait until we have loudspeaker wallpaper with thousands of channels all around us (see my related unpublished paper on “Ultra-Directional Microphones” at http://www.jamminpower.com/main/unpublished.html). Let 44,100 blossoms bloom . . .
James A. Moorer
www.jamminpower.com

That is a very true statement. Though I have no idea what this means for us musicians (Sounds effin hard to work with/for) I think that surpassing the 2.0-standard is the next logical step.

I would like to read a study about this involving Synthetes.

You mean people with synesthesia?