Programming Volume Controls

来源:百度文库 编辑:神马文学网 时间:2024/04/19 11:00:56

Programming Volume Controls

a.k.a. Notice to Programmers of Audio Software and Hardware

There is a single very annoying thing about lots of audio software products which is due either to lack of the programmers' knowledge about the human auditory system, to laziness, or even both. If you are a programmer who could ever be involved, even remotely, in the development of a software or hardware product involving sound, please read this text carefully, burn its core message into your memory, and spread the news!

To the Point

Because people have so little time nowadays, here is the essence of this text, compressed into a few sentences:

  • Volume sliders must not be linear. Linear volume sliders are an annoyment to users because human hearing is not linear at all, it is logarithmic. That's why all audio equipment worth its name, uses the dB scale to indicate volume and gain settings. For a relative volume level x, the dB value is equal to 20*log10(x). Positive dB values mean amplification, negative values mean attenuation.
  • The ideal volume slider follows an exponential curve, with its lowest setting corresponding to 0dB(A) and its highest setting to the loudest volume the user's audio equipment produces. This is quite unpractical to work with because you can only make vague assumptions about what equipment the user has, so forget about this unless you are working on a very high-end product.
  • A good all-around and computationally cheap approximation of an exponential curve which fits low-to-medium powered consumer audio systems (up to about 60dB(A)), is the 4th power of the volume slider's position, so: volume scale factor = x4, where x is the volume slider's position, rescaled to the interval [0,1]. For systems with higher maximum loudness, a higher factor is needed, e.g. x^8 for 90dB(A). If you can't afford implementing a true exponential curve, use this simple formula for all your volume sliders, it's not perfect, but a billion times better than a linear slider!!!

If you want to know more, read on. Otherwise, read the third point again and make sure you'll never forget it.

About Volume Controls

Most audio software nowadays has sliders or even rotating knobs to control the volume. The intention is to simulate the sliders of ‘classic’ audio hardware. Unfortunately, there is one thing about a lot of volume sliders which makes them a pain in the ass: they are LINEAR. You might ask, what could possibly be wrong with a linear slider: it is zero at the one end, 100% at the other end, and neatly linear in between, isn't that just ideal? The answer is a big no.
Just try this: open your favourite audio player, start playing a song, grab the volume slider, and wobble it to and fro at the ‘loud’ end of the volume range. Next, do the same at the ‘silent’ end of the volume range. Chances are that you will experience the following: almost no audible volume variations at the ‘loud’ end, and extreme volume variations at the ‘silent’ end, even if you made smaller excursions with the latter. In that case you can be pretty sure the slider is linear.
A few popular applications that suffer from this flaw at the time of this writing, are:

  • QuickTime Player
  • iTunes (fixed in the newer versions!)
  • Windows Media Player

The evil has even spread to hardware. Velleman sells a solderable kit of a graphic equalizer, K4302. I don't know if this has been corrected now, but when I bought the kit around 1995 it had linear sliders while they should be logarithmic (C law if I'm correct). Even the G3 iMac's volume control was linear, and I'm afraid that this is just one of many examples. The result is that, even with a lot of steps, the most silent volume setting is still way too loud, and the maximum volume level is almost reached already in the middle of the slider. Ultimately this leads to frustrated people cursing the damn volume control, or feeling uneasy while using your software without really knowing why. Luckily there are lots of products with correct volume controls, but I have the feeling that they are only a minority.

What is going wrong?

Now what exactly is wrong with a linear volume slider? The answer lies within the way our ears perceive sound. The point is that our sensation of ‘loudness’ is LOGARITHMIC. This means that with silent sounds, we are much more sensitive to small variations in volume than with loud sounds. This allows us to cope with a very large dynamic range of sound volumes. It also means that with a linear volume slider, we have a logarithmic sensation of volume variations, and that just doesn't feel right. At the right you can see a logarithmic curve. Two identical sections are marked on the horizontal axis (read: the volume slider). The vertical axis shows the perceived volume changes. As you can see, the corresponding section marked by the curve at the ‘silent’ end is much larger than at the ‘loud’ end.

The solution to implementing a REAL volume slider is fairly simple: instead of being linear, a volume slider should be EXPONENTIAL. For, log(exp(x)) = x, so the sensation of volume variations will be linear, and that's what we want.
In this text I will assume that the amplitude of the audio hardware is controlled by giving it a value between zero (silence) and 1.0 (maximum).

Finding the ‘ideal’ curve

Exponential functions, however, have two annoying properties. The first is that they only reach zero at minus infinity. One can't make a slider that's infinitely long (but as will be explained below, in an ideal setup there is no need to reach absolute zero). The second is that in the general form y = a*exp(b*x)+c, an exponential function going through two points can have various shapes. Even a linear function is a limit case of such a curve. Luckily in the case of our volume control, we can and should limit the equation to y = a*exp(b*x) because our ears don't have an offset. This means that two points suffice to obtain a unique solution for the constants a and b. We already know one of those points, because we want the function to have a value of 1 for x = 1. This means that a = 1/exp(b). So the problem is reduced to determining the correct value of b, which controls the shape of the curve. Small values produce a very ‘sharp’ curve, while large values produce a more linear-like curve.

If you're still thinking linearly, you might be tempted to pick (0,0) as the second point, which it is not. As I said above, our exponential volme control will inevitably still have a non-zero amplitude at the zero slider position. This is not a problem because the logarithmic response curve of our ears also hits zero below a certain non-zero input loudness, the hearing threshold. The major problem is that even though this threshold is roughly the same across different persons, the loudness produced by any audio system for a given signal amplitude depends on a multitude of parameters. To determine the correct value for b, we need more information. If we want to provide the user with a “fully linear volume control sensation”, we would need to know how ‘loud’ his/her audio equipment plays at its loudest setting. You'll immediately understand that this is not a practical question. There simply is no specific answer to it, unless you are developing software for some very specific audio hardware. So we will have to make some assumptions. First something about how sound ‘loudness’ is measured.

Measuring sound levels

Because the human auditory system has a logarithmic sensitivity curve, a special unit of ‘sound loudness’ was invented. The unit is the deciBel, abbreviated to dB. Actually the original unit is the Bel, but this unit was so large that it's always used with a factor 1/10, hence deci-Bel (1 Bel = 10 dB). There are two kinds of dB scales, an absolute scale and a relative scale.
The absolute scale tries to give an indication of how loud a certain sound is perceived by an average human listener, aka the “sound pressure level” (SPL). There are some variations on this scale, but the most widely used one is the “dB(A)” scale. To determine the dB(A) value for a certain sound, the sound has to be filtered through a filter which corresponds to the frequency response curve of the “average human”. Next, the 10-base logarithm of the power is taken and the result is multiplied by 10. I won't go into more detail on this, because it is not of much use here. What you should know, is that the most silent audible volume level (the ‘hearing threshold’) corresponds to 0dB(A), and the loudest volume level (the ‘pain threshold’) is about 120dB(A). A classical orchestra can produce about 94dB(A). Note that, because of the calculation, multiplying the power of a sound by a factor of 10 means adding 10 to the dB(A) value.
The relative scale is used for all kinds of physical quantities, and indicates the relative amplitude of a signal compared to another. The symbol is simply “dB”. The calculation of the dB value depends on whether you are working with amplitude values or power values. For power values, the formula is 10*log10(x), with x the relative power. For amplitude values, the formula is 20*log10(x). The reason is that power ~ amplitude2, and the second power becomes a factor 2 after taking the logarithm.

Finding the not-so-ideal-but-still-quite-good curve

Now we know more about the dB scale, we can go back to our problem of determining a good b value. We should make sure the resulting curve results in a near linear loudness experience with the listener. Let's assume that the maximum volume that can be produced by the user's equipment is 60dB(A). That's not very loud, but should be a realistic value for the average everyday volume setting of everyday users' speakers, especially built-in speakers in PCs and laptops. Better equipment may go up to 90dB(A) and Hi-Fi or PA systems will go even higher.
We now know two points of our y = a*exp(b*x) curve, namely: (0, 0dB(A)) and (1, 60dB(A)). Since we work with amplitudes, 60dB is 1060/20 = 1000 times the amplitude of 0dB. So, 1000 = exp(b*1), hence b = ln(1000) = 6.908. The value of a is simply 1/1000. (For the 90dB(A) case, b=10.36 and a=3.1623e-5.) So we now have a practical curve which should produce an agreeable result in most situations. Theoretically, the lowest position on the slider should be at 0dB(A), the hearing threshold. Although this means there's no real need to force the output to zero, in practice this is desirable because people expect absolute silence at the zero setting, and this isn't guaranteed with al the guesswork we did. So we should add “if(x == 0) ampl = 0;” to the slider code.

However, lots of programmers will not like including an entire math library just to make their volume slider right. Luckily, there is an alternative which sufficiently approximates an exponential curve, is much cheaper and reaches zero at zero automatically. If you look at the graph at the right, you'll see 3 curves: the linear curve (yuck), the 60dB exponential curve (red), and the curve of the function x4 (blue). As you can see, the blue curve lies pretty close to the red curve, and you can also see how monstrously the linear curve deviates. The 4th power-function demands only 3 multiplications (or 2 at the cost of an extra line of code), and it starts from zero, what else could you want? I tried this curve in some experiments and for most volume settings it appears to have a very natural ‘feel’, so I can highly recommend it. Depending on your personal taste, you may find x5 an even better approximation. Keep in mind that in situations where the maximum volume is rather quiet, you may need a less ‘strong’ curve like x2, and a ‘stronger’ curve if the maximum volume is really loud. For the 90dB(A) case, x8 is a good approximation.

If you're going to use a discrete control instead of a slider, e.g. a volume control that can be increased or decreased in steps by pressing an ‘up’ and ‘down’ button, you should be aware that the smallest difference in volume that humans can perceive is about 1dB, or 10%. Actually this also counts for many other perceptions, like the size of an object, or speed. So it's useless to make your increments smaller than 10%, but you shouldn't make them too large either, or your volume control will be too coarse.

Remember that all this doesn't only apply to sliders. It also applies to rotating knobs (although these are quite rare in software, but you bet that all potentiometers of decent audio equipment have an exponential characteristic) and menus with volume presets. It also counts for equalizers, because these are volume controls in their own right, even if they only control a part of the frequency spectrum. After reading this text it should be clear that implementing volume controls is not exact science except in well-controlled situations. However, the core message that you should take home is: volume must be exponential, or at least look like it!

About Frequency Controls and Analysis

This is somewhat less of an issue, because few applications have to deal with frequencies at the user end. However, for those that do, a similar story holds, but with a slight difference. The human sensation of ‘tone’ is also far from linear. But it's not exactly exponential either. At the low frequency side, it's more linear, while at the high frequency side it's exponential. However, an exponential curve is a much better overall approximation than a linear curve. So please, no linear frequency controls either! Ever seen a piano that was tuned to a linear scale?
This doesn't only count for sound generation, but also for sound analysis. If you want to create a spectral analysis, the graph should have a logarithmic scale (on both axes!!!), unless there are specific reasons to use a linear scale. With a linear frequency scale, all low frequencies will be squeezed into a few lines while the high frequencies will be smeared over a wide area. Mind that even though the audible sound range reaches until 20kHz, "high frequencies" already start at ±2kHz! The most interesting stuff in music happens below 2 kHz. For speech, you can't do much with frequencies above 4kHz (that's why telephones filter these out). Yet these would occupy 80% of a linear spectrogram!
Unfortunately, it's not easy to generate a spectrum with a logarithmic frequency axis. FFT's are linear and the only way of getting a log scale from an FFT is to warp the output, resulting in poor resolution at the low frequencies and exaggerated resolution at the high frequencies. There's no such thing as a variation on the FFT which produces a log scale right away, but other approaches can be used, for instance a filter bank with filters whose bandwidth increases with frequency. The only problem with that approach is that the time interval of the lower frequency filters needs to be longer than the higher frequencies, which could make it difficult to provide a unified frequency response at any given time.

Copyright 2002-2009 Alexander ThomasThis text may be copied freely and there is no limit whatsoever in implementing the described techniques. In fact I would be glad if you helped in spreading these advices around! However, the text (including this copyright message) may not be modified without permission of the author.
Contact me at my mail page with questions or remarks.