I once found a really good article that explained dither...

Post by Andi » Sun Jun 30, 2013 12:34 pm

...and then lost it. The article went into some detail about the stats of how dither causes quantized audio to toggle between values in a very clear and simple way and fills in the "yes, OK, but......" that usually hangs at the end of a dither explanation. In my memory it has the air of a Hugh article, but I can't find it and it may be from elsewhere (unfortunatley "dither explanation" isn't very specific as a Google term). Does this ring any bells with anyone?

A.

Post by The Elf » Sun Jun 30, 2013 1:12 pm

This is a good document.

Post by Andi » Sun Jun 30, 2013 6:43 pm

Thanks Elf,it was actually re-reading the Ozone guide that reminded me of the article I`m seeking. The one I`m after explains how the added noise interacts with the signal such that it toggles the value of the bit correctly, lots of articles get as far as "you add low level noise and all is good", a few get far as toggling, but this one actually explained the stats behind it relatively simply.

Thanks all the same.

A

Post by dmills » Mon Jul 01, 2013 12:25 am

I posted a couple of things that might fit that description a few years ago if forum posts are what you are looking for:
"Re: We record linearly but hear logarithmically....", post number #625671
"Re: Sample Rate/Bit Depth.......Just how much difference do these make?", #596334

I also remember an series of articles on digital audio basics by Hugh.

Regards, Dan.

Post by Tony O'Shea » Mon Jul 01, 2013 8:23 am

Hi Andi,
Possibly one of Dan Lavry white papers ? They're always worth a read even if it isn't.
Best,
Tony

Post by Tomás Mulcahy » Mon Jul 01, 2013 9:39 am

Andi wrote: this one actually explained the stats behind it relatively simply.

Probably Nika Aldrich so, he explained the stats very well. Here you go:

http://www.users.qwest.net/~volt42/cadenzarecording/DitherExplained.pdf

Post by Andi » Mon Jul 01, 2013 11:11 pm

I'm struggling to find time to read the suggestions but just wanted to acknowledge the help here. The article I'm thinking of had the sort of level of detail of a Dan Lavry paper, but I also recall being impressed with something from Nika Aldrich in the past too. It wasn't the Hugh article series nor either of Dan's posts - but I'll put them on the Nexus for a re-read anyway.

I'll post back when I get there.

A.

Post by Matt Houghton » Wed Jul 03, 2013 3:27 pm

Andi wrote:...and then lost it. The article went into some detail about the stats of how dither causes quantized audio to toggle between values in a very clear and simple way and fills in the "yes, OK, but......" that usually hangs at the end of a dither explanation. In my memory it has the air of a Hugh article, but I can't find it and it may be from elsewhere (unfortunatley "dither explanation" isn't very specific as a Google term). Does this ring any bells with anyone?

A.

Might it be this one?

Post by Hugh Robjohns » Wed Jul 03, 2013 4:04 pm

There's this explanation from Bob Katz but it's still fairly simplistic. I'd be interested in a really good non-mathmatical and approachable explanation of dither -- it is a notoriously hard concept to get across to non-engineering graduates!

H

Post by Andi » Thu Jul 04, 2013 4:42 pm

Matt and Hugh, sadly not. The Hugh authored article covers most of it, but the piece I have in mind actually gave a little more detail about the stats involved beyond sinply the use of the TPDF.

I'm beginning to wonder if I simply ate too much cheese one evening and had a Coleridge moment.

I'll post back if or when I find it; if it ends-up being less than I recall I'll either pretend I didn't find it or else slink away and change my userID.

Still some very good reading offered though.

A.

Post by DC-Choppah » Fri Jul 05, 2013 11:05 pm

The best non-technical explanation of dither I have seen (heard actually) is the recording of a cymbal crash that decays to silence.

The dithered one sounds smooth. The non-dithered one sounds non musical as it gets quiet and the systematic repeating errors due to quantization can be heard as distortion.

Dither transforms a systematic repeating error into a random one.

Post by Andi » Tue Jul 16, 2013 4:20 pm

OK, it was actually in Bob Katz' Mastering Audio book, and the bit that approximately 1/2 stuck in my mind (see what I did there?) was the explanation of analogue domain dither where the value of the analogue signal modulates the dither noise. That's the bit (again!) where I fall in a heap with purely digital domain dither, what stops the LSB from simply toggling on and off randomly and "eases" it towards the correct probablility ?

Post by The Korff » Tue Jul 16, 2013 5:26 pm

Andi wrote:what stops the LSB from simply toggling on and off randomly and "eases" it towards the correct probablility ?

Dither acts precisely to make the LSB toggle on and off randomly! That's its purpose...

But instead of thinking of the LSB "toggling on and off randomly", think of it as being on half the time — which is precisely what would happen if you were to feed an A-D converter some random noise, with an average value equivalent to the LSB (ie. dither). Does that help? It's not a very easy thing to explain, I'm afraid, without the help of an over-head projector and some slides...! But the Katz book does have some very good diagrams, if you've got a copy handy.

Cheers!

Chris

Post by Tim Gillett » Tue Jul 16, 2013 6:40 pm

Just to confuse things more, in discussions about digital audio, quantisation and dither, the term "zero crossing" sometimes comes up. What is meant by this?

Post by Richie Royale » Tue Jul 16, 2013 6:57 pm

http://en.wikipedia.org/wiki/Zero_crossing

Post by Andi » Tue Jul 16, 2013 8:20 pm

Chris

I always figured that there was some mechanism by which the toggling of the bit would be biased towards the true intended value - although I have no idea where this value would be derived from. The article that I thought I recalled explains how this works, but in dithering an analogue signal - where obviously there is a "true" value available. I guess the answer is in fact that the process is akin to running a bit of glasspaper along a piece of wood to soften the edge, whereas I was fancying that we had a way to somehow cut a specific profile - without knowing what that profile asctually is. Or something.

Forgive me - it's too hot in here and I'm running too many valve amps - I think I'm becoming light headed. About Dither.

Post by Tim Gillett » Wed Jul 17, 2013 2:55 am

I think of say a 8 or 16 bit audio file as having a built in noise gate. Dithering is simply adding some random noise to just keep the gate open, at all times, that is, for every sample point.

We're tricking the noise gate to stay open. That's all.

Tim

Post by Hugh Robjohns » Wed Jul 17, 2013 11:40 am

Andi wrote:I always figured that there was some mechanism by which the toggling of the bit would be biased towards the true intended value

It is.

Essentially, the bits to be truncated are summed with random noise (generated with a specific statistically probability) and the resulting value used to determine the value of the LSB of the new sample with shortened wordlength. So the LSB is effectively being modulated by the removed audio and consequently all of the audio information conveyed in the longer wordlength signal is retained, albeit within a new higher noise floor.

We can perceive audio content as much as 20dB below the noise floor in some situations, so it is entirely possible and practical to convey signals with dynamic ranges of 120dB or so within a 16 bit medium. This is more noticeable when using noise-shaping which allows the noise floor to be made subjectively less audible and therefore the encoded low-level audio even more audible, too.

H

Post by Andi » Wed Jul 17, 2013 1:40 pm

Try to imagine a cartoon light bulb switching on!

Sometimes my own dimness astounds me:headbang:

Thanks H!

Post by Hugh Robjohns » Wed Jul 17, 2013 1:55 pm

Dither is one o the hardest digital audio concepts for people to grasp. It's easy to demonstrate that it works, but really hard to understand why! And most 'understandable explanations' -- mine above included -- are inherently superficial and flawed in some respects.

But then we don't need to know how to design a good 48V DC power supply to be able to capture great sounds with a phantom powered mic. We just need to know that power is needed and how it gets to the mic. The same is true of dither -- as long as you know broadly what it's doing and why it is needed in A-D converters and in word-length truncation(and what noise-shaping does to it), that's usually enough.

H

Post by Andi » Thu Jul 18, 2013 10:06 am

Hugh

totally agreed - I don't understand gravity either, but I use it daily.

Can I test some assumptions purely for the curiosity of it? Ignoring the realities of using binary arithmetic, no parity bits, no 2s compliment etc, .... and ignoring realities of internal noise in analogue components and real world values throughout.

If I begin with a piece of data in a pure, theoretical, 24 bit start to finish environment (for the sake of simplicity) I have somewhere over 16 1/2 million values that I can present to my DAC to tell it how big a voltage to produce. In a 16 bit environment I have a bit over 65 thousand values. The lowest value will tell the DAC to produce 0V, the highest value will tell the DAC to produce whatever is its upper limit. Between these 2 voltages I can have 16+ million values or 65+ thousand values. Writers write about "steps" but we don't get steps, we get a reconstructed waveform which contains quantisation errors in amplitude only (presume perfect clocking) - it's deformed by the approximations required to describe the varying level of a continuous signal by means of discrete values. No dither yet! The 24 bit system does not reconstruct a higher resolution wave, but it reconstructs one with lower level errors. The size of the errors is proportional to the difference between any 2 adjacent values that can be represented, and the lowest and highest voltages produced are the same for either system and are represented as 000.....0000 and 111.....1111. The actual errors are constantly changing and are uncorrelated, so they appear to be random, and manifest as noise. Because the errors in a 24 bit system are lower than in a 16 bit system, the 24 bit system is quieter - lower noise floor.
If we apply dither to an analogue signal at the AD stage, we add low level noise that causes the least significant bit of every word, irrespective of its value, to toggle on and off. Any instantaneous observation of the LSB could see it as 1 or 0, but over a suitable period of time it will average such that the mean value will be between the 0 and 1 value, with the probability of its mean value effected by the true analogue value. As an example, if a 0 value of a 1 bit system represented 0 Volts, and a 1 value represented 1 Volt, and the analogue value was 0.5V then the LSB would be a 1 for half the samples and a 0 half the samples; if the analogue value was 0.75V then the LSB would be a 1 for 75% of the samples, a 0 for 25%. This occurs for every level of signal being coded, from no signal to the highest the ADC will handle.

When word length increases we add bits. The analogue values to be recreated by the DAC still have the same limits, 11....1111 will still be the highest Voltage the DAC can create, and 000.....0000 will still be the 0V. I presume that the process of word length expansion re-writes individual sample values so that they represent the same "analogue" values; there are simply more different values available between the ones actually used? This is no problem, the data will sit quite happily until something is changed - processed, at which time new values will be written as required.

If we reduce word length, we lose data. The highest and lowest values must represent the same analogue values, so we have less discrete values between, and so the quantisation errors will be bigger. This is truncation. When we dither in the digital domain, "noise" is added to the data in the form of un-correlated modification of the LSB. As we are losing data - in the case of reduced wordlength - the probablity of the LSB being toggled high or low is effected by the more precise information that we are about to lose. The "noise" is added to the data and is stored in the new file, it becomes part of the signal, so if we perform any processing on the data, we are processing the dither too - which writes new LSB data and therefore undoes the value of the dither. Hence, dither as the last step of export when reducing word-length.

?

Post by The Korff » Thu Jul 18, 2013 10:28 am

Morning!

I'm sure Hugh will be along to lay the smack down soon, but in the meantime...

Andi wrote:[The reconstructed waveform is] deformed by the approximations required to describe the varying level of a continuous signal by means of discrete values.

No it isn't! Given perfect clocking and filtering (ie. the anti-aliasing and reconstruction filters), there's no reason for the reconstructed waveform to be deformed at all. The discrete nature of the amplitude values doesn't cause any kind of distortion, because the bandwidth of the input is less than half of the sample rate.

Cheers!

Chris

Post by Andi » Thu Jul 18, 2013 10:53 am

Chris - understood, I'm trying to get my head round the process a step at a time, so in the example, if we were to encode and reconstruct without dither but with perfect clocking and filtering and analogue circuitry, we would have perfectly measured, time accurate samples, imperfectly stored as data to the nearest available value - and the DA process will perfectly reconstruct from that slightly wrong data to give a distorted version of the original signal? I appreciate that this is not how it's done in the real world and that dither corrects (or reduces?) the errors.

A.

Post by Hugh Robjohns » Thu Jul 18, 2013 10:59 am

Andi wrote:Can I test some assumptions purely for the curiosity of it?

We can try

The lowest value will tell the DAC to produce 0V, the highest value will tell the DAC to produce whatever is its upper limit.

Not quite -- this is where the two's complement comes in to play. Remember audio is a bi-directional -- there are positive and negative peaks either side of the '0V' centre line. So in practice roughly half the total number of quantisation levels represent the positive side of the waveform and the other half the negative side.

Writers write about "steps" but we don't get steps

Correct. The 'steps' thing is a useful step (excuse the pun) along the way of understanding quantisation, but most commentators stop short of introducing dither (because it starts to get difficult to explain) which removes the steps completely. So no steps!

we get a reconstructed waveform which contains quantisation errors in amplitude only (presume perfect clocking)

Correct. But considering a quantisation process without dither is misleading because you end up worrying about things that don't exist in practice.

The actual errors are constantly changing and are uncorrelated, so they appear to be random, and manifest as noise.

For an undithered system, this is true only if the signal crosses a very large number of quantisation levels. If the signal level is very small and so only crosses a few quantisation levels (of if the system has a very low wordlength) then the errors become correlated to the audio and manifest as obvious quantisation distortion. Adding dither essentially moves the quantisation levels around at random and all errors become uncorrelated and thus noise-like.

Because the errors in a 24 bit system are lower than in a 16 bit system, the 24 bit system is quieter - lower noise floor.

Yes. The maximum possible quantisation error is much smaller, and thus the maximum size of error is smaller. If the error is random (either because the audio signal is crossing a lot of quantising levels or because we have introduced dither), then it manifests as noise and thus the noise floor is lower in a 24 bit system than a 16 bit one.

If we apply dither to an analogue signal at the AD stage, we add low level noise that causes the least significant bit of every word, irrespective of its value, to toggle on and off.

Yes -- although in practice it will be more than just the LSB. The best A-D converters have a noise floor approaching -130dB, not the theoretical -141dB that should be possible in a dithered 24 bit system.

As an example, if a 0 value of a 1 bit system represented 0 Volts, and a 1 value represented 1 Volt, and the analogue value was 0.5V then the LSB would be a 1 for half the samples and a 0 half the samples; if the analogue value was 0.75V then the LSB would be a 1 for 75% of the samples, a 0 for 25%.

Correct.

I presume that the process of word length expansion re-writes individual sample values so that they represent the same "analogue" values; there are simply more different values available between the ones actually used? This is no problem, the data will sit quite happily until something is changed - processed, at which time new values will be written as required.

Correct.

If we reduce word length, we lose data. The highest and lowest values must represent the same analogue values, so we have less discrete values between, and so the quantisation errors will be bigger. This is truncation.

Correct.

When we dither in the digital domain, "noise" is added to the data in the form of un-correlated modification of the LSB.

No. The truncated LSBs are added to a noise-like signal, and that combination is then used to form the new LSB of the truncated sample word.

if we perform any processing on the data, we are processing the dither too - which writes new LSB data and therefore undoes the value of the dither. Hence, dither as the last step of export when reducing word-length.

It doesn't 'undo' the value of dither, but it is better to do any processing on signals with the maximum possible wordlength, and only when all processing is complete, reduce to a lower wordlength with dither.

The analogue equivalent is mixing and mastering with 1/2-inch 30-ips tape before creating the cassette tape for the end consumer, rather than bouncing the mix out to a cassette and expecting the mastering engineer to work his magic on that before handing it on to the consumer.

Does that help?

H

Post by The Korff » Thu Jul 18, 2013 11:00 am

Ah, right - sorry, I think I read your post a little too quickly. Yes, in short!

Cheers,

Chris

I once found a really good article that explained dither...

I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...

Re: I once found a really good article that explained dither...