Digital Audio Question

DXDXDX · Jul 30, 2012

I have what may appear to be a pretty dumb question, but keep in mind, I'm in the business end of things! Bear with me as I try to ask this in as intelligent way as I can which is no doubt miles below most of you guys!!! I understand (I think) to convert an analog signal to digital, the "waveform" is scanned many thousand of times per second. The end result is something tells the A to D converter to assign either a one (high) or zero (low) to each value of that analog signal when it is scanned. We can then get a perfect reproduction of that waveform by assembling the highs and lows. Suppose we have a signal which is 1000 cycles for one second but then switches to 400 cycles the next second at one half the loudness of the 1000 cycle signal and this pattern continues. This is the tough question...how does the "scanning" deal with the change in frequency and loudness? Once I get an answer I can understand, I will ask just one more! Tnx!!!

konbaasiang · Jul 30, 2012

A sampled analog signal is preferably more than one bit per sample. More common is 16 or 24 bits per sample. We can think of the ones and zeroes as representing a decimal number between -1.0 and 1.0. We don't have worry about the fact that they're stored as 1s and 0s under the hood -- we think only of the number they represent.

Assuming 3-bit signed sampling (to make it easy):

011 - value 3
010 - value 2
001 - value 1
000 - value 0
111 - value -1
110 - value -2
101 - value -3
100 - value -4

We then scale these values to the voltage levels we need. Value 0 is of course 0 volts.
Indeed 3 bits is not enough for accurate audio reproduction -- that's why we use more bits than that.

As long as we sample often enough to represent both the positive and negative peaks of the analog waveform, with no peaks slipping between two samples, we have accurately captured the original waveform, and can accurately reconstruct it. We don't care at all *what* frequency components actually make up the analog signal, as long as the highest frequency component is less than half of our sampling frequency -- because we don't have to! It's not important for the sampling/reconstruction process. We're just capturing the signal "exactly" as it was.

Best regards,
///Leif

K6JHU · Jul 30, 2012

So for your example, the sampling frequency needs to be at least 2,000 samples per second. It can be anything above that. 2,000 is the minimum. It is the 1,000 Hz that establishes the minimum sampling rate. BTW, for CD the sampling is 44,100 samples per second giving a frequency response of a little over 20KHz.

Lets assume the at the full volume is 1000 (binary). Than half volume would be 0100 (binary). So if I do this correctly and the input is a square wave (for simplicity) a waveform of 1000Hz, followed by 500 Hz at half volume (at 2000 samples per second) would look like:

1000
1000
0100
0100
0100
0100
1000
1000

DXDXDX · Jul 31, 2012

Sorry...still not getting it! Could be its so simple I can't see it. Let me ask it another way...the analog waveform is sampled 44k times per second. The more sampling in a given time the better the "resolution" or quality. When the sampled signal, now represented by digital base two numbers is converted back into an analog signal, what tells the D to A converter the signal is, as per my example, 1000 or 400 cycles? I can't wait to ask my second and final question but let's get over this mountain first!

konbaasiang · Jul 31, 2012

Okay, I'll try one more time:

DX, the point you're missing is that we're not storing frequency at all. We're storing the voltage of the original signal at the particular moment of every sample!

See the following image: http://i.imgur.com/qYKjU.png

This is a 2000 Hz sampled signal containing:

907 Hz (left channel - upper)
400 Hz (right channel - lower)

Contrary to popular belief you absolutely cannot represent a 1000 Hz wave in a 2000 Hz sampled signal, because there is nothing to tell D to A converter either the amplitude or the phase of the original signal. The samples will *not* magically be at the peaks of the original waveform, they can end up anywhere!
I chose the odd frequency of 907 to illustrate this. Reconstructing the waveform from samples is *not* a matter of connecting the dots, as Robert Orban once told me!

In the image, the continuous line is the signal that will come out of a proper D to A converter (DAC).
As long as there are irrefutable clues often enough, the DAC will handle the job perfectly. However, you cannot go too close to half the sampling frequency, because if you're infinitely close (for example, trying to represent ~999.999999999 Hz in a 2000.0 Hz sample rate signal) you'd have to have an infinitely long reconstruction filter in the DAC, with infinite delay.

So, when sampling at 2000 Hz, 1000 Hz is out. So is 999 Hz, and probably also 950 Hz. 900 is most probably okay!

The fact that the samples will not magically end up at the peaks of the original waveform is an important lesson. This is why we have to oversample our digital clippers to prevent overshoots (more samples = more likely that one of them will be close to the peak).

Two more examples:

http://i.imgur.com/HbIKO.png

Both channels here show a 500 Hz tone sampled at 2000 Hz. I picked an even multiple of the sampling rate for simplicity.

The difference between them are the phase! They are phased 45 degrees apart. Both are perfectly valid, and will reconstruct fine.

But, what if we do the same thing with two 997 Hz tones, spaced 45 degrees apart?

http://i.imgur.com/eAgAn.png

I had to zoom out to illustrate the problem.
The outline is the peak level of what will be reconstructed. You can see we didn't get a continuous tone like in the previous examples, because the clues in the samples were simply too infrequent. We'd need a much longer reconstruction filter to get it done. It would be possible to construct a DAC to reconstruct this signal properly, but the delay of that DAC will probably be half a second!

Best regards,
Leif Claesson

konbaasiang · Aug 3, 2012

Did you have any more questions, DX?

RolfTaylor · Aug 3, 2012

I think what you are missing is that by "sampling frequency" or sampling rate" it means we look at the analog system every so often as measured by a clock. Think of a stop watch. Every second you read the meter and right down the value.

At the D to A converter you must use the same clock rate. If the stop watch is off the frequency of the reconstruct signal would be off.

So lets assume the stop watch at the D to A is running twice as fast as the stop watch at the A to D. In this case two things will happen

All reconstructed sounds will be at twice the frequency of the originally digitized audio

and, if the system is a real-time transmission system (such as a real time codec such as a zephyr) you will get periods of silence periodically (in a literal sense) interspersed with audio, since the D to A is "consuming data" faster than the A to D is creating it.

I hope this helps

bilco · Aug 3, 2012

Rolf,
You are mostly right, but most modern D-A conversion devices have circuitry/software that eliminates the silences, or at least masks those periods. One of the common ways of doing this is simmiar to the error correction technique. If the decoder, becasue of mismatched data rates, finds a "Hole" it fills it with the previous data bits. Since this occurs infrequently, it's virtually inaudible to the listeners. Other system take the average between the data bytes/bits before and after the "hole". From a purist point of view you are correct, but the end result is not likely to be perceptable to the listeners unless it's a very big hole, usually caused by a complete data loss rather than a slight mismatch in data rates.
.

konbaasiang · Aug 3, 2012

Differing input and output sample rates is exactly like recording and playing a tape back at a different speed. It's a perfectly clean pitch-up or pitch-down process.

The tape is a very accurate analogy actually, because just like a tape, you can only do this with something *pre-recorded*, where either the player alone or the recorder alone controls the speed.

Imagine if you have a real-time system, where you're recording and playing on the fly. In the tape analogy, this would be a tape loop.
Now try to play and record that back at different speeds! You will either stretch then snap the tape, or you'll have tape salad.

In a real-time system, if you need to record and play at different rates, you need a sample rate converter. It is absolutely not okay to duplicate or discard individual samples -- this causes audible clicks on any program material that doesn't have enough high frequencies to mask the click. In a sample rate converter, the output values come from *between* the available samples of the input. It is not okay to just interpolate to find these missing values, because again, reconstructing a waveform from samples is not a matter of connecting the dots. A proper sample rate converter is akin to properly converting to analog and then sampling it again.

There is a great deal of misconception about how digital audio works. No wonder most people think it's magic!
It isn't -- it's really quite straightforward. There are simple concepts with simple rules one must follow. The implementation is difficult but the concepts and rules are simple.

w2xj · Aug 6, 2012

DXDXDX said:
Sorry...still not getting it! Could be its so simple I can't see it. Let me ask it another way...the analog waveform is sampled 44k times per second. The more sampling in a given time the better the "resolution" or quality. When the sampled signal, now represented by digital base two numbers is converted back into an analog signal, what tells the D to A converter the signal is, as per my example, 1000 or 400 cycles? I can't wait to ask my second and final question but let's get over this mountain first!

Forget about frequencies. Look at it this way - the analog to digital converter is sampling at a constant rate in a linear system. Let us say that is 44.1 kilohertz. It is sampling the VOLTAGE of the analog signal. That is all it cares about. The important factor is the nyquist limit. Which according to the theory, a signal can be accurately reproduced if it is 1/2 or less than the sampling rate. So in theory an A to D running at 44.1 kilohertz can sample a signal of 22.05 kilohertz. There are real world limits that make the upper slightly less but they are not important in this discussion.

The 22.5 kilohertz signal is sampled twice each cycle so there is some ambiguity as to where the signal is sampled. In a linear system, those samples are fed at a constant rate so the digital to analog converter outputs those samples at the same rate it was sampled. Now, to keep the math simple, take a signal of 2.5 kilohertz or one tenth of 22.5 kHz rather than being sampled twice per cycle, it is sampled 20 times per cycle. No the digital to analog gets 20 samples of the 2.5 kilohertz signal so the sine wave is more accurately sampled. The digital to analog converter is very dumb. It sees each sample of a voltage and outputs that voltage. It has no way of if there are two samples of 22.5 kilohertz or 10 samples of 2.5 kilohertz. It just keeps outputting whatever voltage the samples represent.

In the real world signals are complex and the analog signal voltage is a signal mathematical addition of all the input frequencies which are based on many complex issues pertaining to the phases and amplitudes of the individual frequencies but in the end, it is just one voltage and that is what is sampled.

Does that help clarify things?

Digital Audio Question

DXDXDX

konbaasiang

K6JHU

DXDXDX

konbaasiang

konbaasiang

RolfTaylor

Guest

bilco

Guest

konbaasiang

w2xj