Better audio engineering with the Fourier Transform
The Fourier Transform has made a huge contribution to almost every area of audio engineering. From individual filters to full mastering suites; from tuning your guitar to how we store and transmit sound, the Fourier Transform is there ferreting out every bit of frequency information from your time-based audio signal.
In this episode, we look at a few examples of how the Fourier Transform has revolutionized the way engineers work with sound.
Resources |
A video on how the Dolby tape hiss reduction system works
Masking by tones – Richard Ehmer
Transcript |
Introduction
Hello and welcome to episode 3 of a podcast all about the Fourier Transform.
My name is Mark Newman. Despite never being particularly good at maths at school, I managed to get a degree in Electrical and Electronic Engineering from the University of Manchester in the UK and have been working as an electronics engineer for the past 25 years.
In that time, I’ve come to love and yes, maybe even obsess a little about the Fourier Transform and how it works. Having always found maths a bit of a challenge, I’ve had to develop my own methods of understanding its unique language by employing a more visual way of looking at the subject.
I’d like to share some of those methods with you via these podcasts and also through an online course I’m developing called “How the Fourier Transform works.” You can visit the course homepage at https://howthefouriertransformworks.com. More about the course and how it is progressing towards the end of this episode.
House Keeping
I know that this episode has been a long time in coming. This is because, over the last few months, I’ve been working on a number of different projects related to the course. You can hear more about those projects towards the end of the podcast.
In the last episode we looked at how the Fourier Transform can be used to keep machines healthy in the field of preventative maintenance. After the episode went out, I was contacted by Erez Shaul who works for a company called Augury which does just that.
Their Fourier Transform based AI is helping their customers eliminate downtime and reduce the maintenance costs of critical machines working in industrial and commercial applications.
We’re in the process of recording an interview about how Augury have made use of the Fourier Transform which I hope to release in a future episode. To get notified about when new episodes are available. subscribe to the podcast in your podcast app, or join the course mailing list at https://howthefouriertransformworks.com/mailing-list.
But now to the main part of this podcast: How the Fourier Transform has revolutionized what audio engineers can do with sound.
A short history of sound recording
The Phonautograph
Back in the mid 19th century, if you wanted to listen to music, you had to go and hear it performed live. That was about to change however, for on the 9th of April, 1860, the French inventor Édouard-Léon Scott created the first-ever sound recording in history. He recorded it on his invention: the Phonautograph.
The sound would be recorded by a subject speaking into a huge bell-like chamber. The vibrations produced by the speaker’s voice would be mechanically transferred to a stylus which would trace out the vibrations onto a smoke-blackened piece of paper which was attached to a rotating drum. The interesting thing is that the Phonautograph was never actually intended to play the sound back it was intended solely as a laboratory instrument for the study of acoustics.
It is only with modern optical techniques that the sound wave stored on the drum can now be played back. Here is the first ever sound recording.
The Phonograph
Later that same year, the American inventor Thomas Edison invented the Phonograph. The phonograph later to become known as the gramophone. Although it recorded the sound in the same way as the Phonautograph did, the stylus etched the recording, not onto paper, but onto a rotating cylinder. This cylinder was initially made out of metal and later out of wax. Once etched, the groove in the metal or wax could move the stylus causing it to vibrate meaning that, for the first time, the sound could be played back. A horn shaped assembly would then amplify the small vibrations of the stylus reproducing an audible sound signal.
This was to revolutionize the way in which we recorded events for the benefit of future generations. Originally, the only way of recording history was in writings and paintings. But now the phonograph enabled us to record the sound as well.
Remarkably, in 1890, the pioneer of modern nursing: Florence Nightingale herself, stopped by Edison’s lab to provide us with the only recording we have of her voice.
The recording is, understandably, a little unclear so I’ll read our what she is saying:
“When I am no longer even a memory – just a name, I hope my voice may perpetuate the great work of my life. God bless my dear old comrades of Balaclava and bring them safe to shore.”
Florence Nightingale
Magnetic Storage
A groove scratched into a material by a stylus wasn’t the only method of recording sound being researched at the time. In 1878, an American engineer, Oberlin Smith, conceived a different method of storing audio information. However, it took another 20 years and the work of Danish engineer Valdemar Poulsen for Smith’s idea to become reality. Unlike the physical groove of Edison’s phonograph, Smith and Poulsen’s method used a magnetic field to store the audio information; initially on a piece of wire and later on a piece of magnetic tape.
Thus the gramophone (later the record player) and the tape recorder became the audio storage devices of choice for much of the 20th century.
Digital Storage
As early as 1938, a British telephone engineer Alec Harley Reeves filed the first patent describing a technique known today as pulse-code modulation. This was a way of storing audio data as a discrete series of numbers.
It would be many years before anyone was able to develop this idea into a practical application. However, it paved the way for the invention, in 1979, of the compact disc. It then took another decade for CDs to catch on. But by the mid-1990s they took their place beside records and cassettes and eventually largely replaced these two mediums.
There were many advantages to digital storage, not least of which was the way that you could repeatedly copy them. A big problem with analogue methods of reproduction was the degradation in the sound quality as you made copies from other copies. The digital nature of CDs solved this issue, much to the dismay of the music industry.
However, there was a more significant advantage for audio engineers. Storing the information as a stream of numbers meant that they could operated on those numbers mathematically. This fact, together with a 200-year-old algorithm, massively expanded what audio engineers could do with sound. That 200-year-old algorithm was, of course, the Fourier Transform.
The Magic of the Fourier Transform
The Fourier Transform enabled engineers operate on the audio signal in a way they had never been able to before. This was due to the fact it treated the signal as if it was a collection of sine waves.
Here are just 5 examples of ways that the Fourier Transform has… well, transformed audio engineering.
#1: Filtering
Things started modestly. The most obvious thing you could do with your signal was to filter it.
Filtering, using analogue components had long been used as a method of improving the sound quality of a recording. Attenuating frequencies that muddied the sound or accentuating those that enhanced it.
But the Fourier Transform took filtering to a new level. No longer would the cut-off frequency or phase response of your filter drift because of inaccurate analogue components like resistors, capacitors or inductors. You could even tailor-make your own filter with its own specific frequency response just for your signal and apply it with pinpoint accuracy. What is more, you could save your settings and recall them for use again at a later date.
Does your recording suffer from mains hum?
Simple! A 50Hz notch will take care of that in just a few lines of code. You even have easy control the filter’s Q-factor.
As the technology became more and more widespread, engineers became more inventive with what they did with the frequency information the Fourier Transform had given them.
#2 Filter Banks
As filtering became easier and more accurate audio engineers could make better use of filter banks. A filter bank is like a lot of band pass filters all positioned side by side. The EQ sliders on your stereo system are one example of a filter bank.
This allowed audio engineers to more accurately isolate a greater number of different frequency bands from their signals. They could then operate on each band independently of all the others.
For example, perhaps you are mastering an action film’s audio track. You want the audience in the theater to really feel the explosions in their bones. Using a filter bank you can isolate the really low bass band frequencies of the explosion. You can then widen them by panning the sound right and left, whilst adding a little bit of reverb or delay for that bone shaking effect.
Maybe you’re mastering a song and want to add a little bit of sparkle to the cymbals. Using a filter bank you isolate the treble frequencies of the cymbals. By adding a small amount of phase distortion you can generate additional complimentary frequencies where there weren’t any before.
Although filter banks were possible in the days of analogue mastering, the Fourier Transform made them easier to implement. It also gave a much greater frequency resolution. This made them more flexible and easier to control than they had ever been before.
#3 Noise Reduction
One of the most important tasks of any recording setup is to get a clean recording of the subject. However, sometimes the environmental conditions are not ideal and noise gets into the recording. Noise can also get into the recording later on. Scratches on records or tape hiss for example. Is it possible to get rid of such noise?
While recordings were analogue, noise occupying a different frequency range to the main part of the signal could be reduced by filtering. Noises like high pitched hissing or maybe mains hum. If the noise occupied the same frequency range, then a noise gate could be employed. This turned the volume of the signal down to zero during quiet periods when the noise was most noticeable. However, if the noise was still present when the amplitude of the signal exceeded the threshold of the noise gate it was much harder to get rid of.
What to do with noise at the same frequency as your signal?
Analogue noise reduction techniques like the Dolby system employed ingenious methods to try and get round this. However, they were limited to reducing cassette hiss. What is more, engineers had to make the recordings in a very specific way. I’ve put a link to a video explaining how the Dolby system worked in the resources section of the show notes for this episode which you can find at https://howthefouriertransformworks.com/audio-engineering
With analogue recordings, there wasn’t really a way of isolating the noise and removing it from the signal. But then, digital recordings and the Fourier Transform changed all that.
If you find a portion of the recording containing only noise, you can measure its spectrum. A computer then identifies the the noise energy at each frequency within the recording and subtracts that energy from the signal.
The method does still have its limitations though. If the noise you want to get rid of is too dominant, then the noise correction becomes too severe. This produces a sort of twinkling sound at the higher frequencies. However, if your signal already has a reasonably good signal to noise ratio, then the noise reduction filter can remove it pretty convincingly.
#4 Pitch Correction
When singing in a choir, the sound becomes quite dire, if someone starts to sing a little flat.
It may be hard to believe, but even professional singers have off days and sing flat sometimes. However, we rarely hear this on their recordings.
The moment we sing a particular note, the frequency of that note in our voices becomes louder than all the others. Using the Fourier Transform, a computer can easily identify which frequency this is.
Each musical note has its own frequency. The A above middle C, for example, has a frequency of 440Hz. But what if the singer went flat? Say by half a tone so that the frequency he was singing was 415Hz. The Fourier transform identifies the note sung. The audio engineer can then tell the computer that the singer was mean to be singing the note A. A has a frequency of 440Hz. With the spectrum of singer’s voice already identified . the computer can easily shift that spectrum up by 25Hz correcting the singer’s voice.
So long as the singer isn’t too flat, this fix is barely noticeable.
Alternatively there are songs that make a positive virtue of pitch alteration making it noticeably part of the song. Copyright issues prevent me from playing you an example here, but head over to the resources section of the show notes for this episode. There, you’ll find a link to a YouTube clip of the song Believe by Cher. You’ll clearly hear the pitch alteration filter doing its work.
#5 MP3s
There is however, perhaps one example of where almost everyone has experienced the wonders of the Fourier Transform first hand. The MPEG-2 Audio Layer III format, or as you probably know it better: MP3 file.
First released in 1991 by the Moving Picture Experts Group (hence MPEG), the MP3 is a lossy-compression algorithm. It takes advantage of the way we humans perceive sound. Lossy compression has a big advantage. It can compress files until they are around a tenth of their original size, but there is a cost. Lossy means that once you compress the original signal, you cannot completely recover it. You lose some of the original audio data. If this is true, how comes MP3 files sound so good?
The MP3 format’s secret lies in its use of Psychoacoustics. Sometimes sounds are made that we cannot hear. They might be too low in volume or too low or high in pitch. They might also be too close in time to other sounds which mask them.
Imagine you are at a busy, noisy railway station and someone on the opposite platform whispers something to you. You’re probably not going to be able to hear what they say. The sound energy of their speech does reach your ears, but because it occupies a similar frequency range to other louder sounds, the louder sounds mask the whisper. This renders it all but inaudible to your ears.
The equal-loudness contour
However, how loud something sounds to us is not uniform across the whole frequency range of our hearing. There is what is known as an equal-loudness contour to our hearing. For example: in order to hear a 250Hz tone, that tone would have to be as loud as 10dB. However, to hear a tone of 70Hz, that tone have to be louder, maybe 30dB for us to hear it.
MP3s take advantage of this. They perform a Fourier Transform on the audio signal and measure the relative sound pressure level for each frequency. If the measurement falls below the perception threshold of human hearing for that frequency, they discard it.
The Psychoacoustic model also takes into account the idea of acoustic masking. If a louder sound exists within the same band of frequencies as a softer one, it raises the perception level of the softer sound. This means that the softer sound would have to be louder than normal for you to hear it. If a masking sound is present, the frequency containing the softer sound can also be discarded.
MP3s split the signal into little chunks called frames. The masking sound may not be present in the current frame. However, if it’s present within a certain time range of the quieter sound, it can still mask it. This means the MP3 can discard the quieter frequency in this case as well.
As MP3s store their information in the frequency domain, discarding this frequency makes for smaller files. As the missing information is inaudible, this maintains the apparent quality of the audio signal.
The Discrete Cosine Transform
MP3s convert the time domain audio signal into the frequency domain using two parallel methods. They use the Fourier Transform and its greater frequency resolution to to apply the Psychoacoustic model and decide which frequencies to discard; and use the Discrete Cosine Transform to actually convert the signal for storage.
The Discrete Cosine Transform is a cousin of the Fourier Transform. It’s used in MP3s as it can store more frequency information in fewer bytes than the Discrete Fourier Transform can, making for a smaller file.
Do you use the Fourier in an interesting way in your work? Why not contact me at: [email protected]?
Course Update
As I mentioned at the beginning, it has taken me quite a while to get round to recording this episode as I have been concentrating on a number of different projects all connected to the course.
For some time, I’ve been trying to get two eBooks together to work in parallel with the online course. The manuscript for the first book “How the Fourier Series works”, is finally complete and is in the proofreading stage. I hope to launch it in early 2022.
The second book, which I hope to launch towards the end of 2022, will be called “How the Fourier Transform works” and will follow the work which Dirichlet did to turn the Fourier Series into the Fourier Transform and look at the different forms of the Fourier Transform culminating in Cooley and Tukey’s Fast Fourier Transform algorithm.
I’ll be announcing the book launches here on the podcast. All of my Patreons on the 10 dollar tier and higher at the time of publishing, will automatically receive a copy in their Patreon feed. For non-Patreons, the books will be available to buy on the course website.
Several people have asked me to go back to making YouTube videos about the Fourier Transform. After quite a break, during which time I was working mainly on the eBooks, I’m happy to announce that I have resumed work on these videos, the most recent being a series I’ve produced on Complex Numbers.
New video series
I always felt that complex numbers were taught in an unnecessarily complicated and baffling way which makes for a huge barrier to understanding how the Fourier Transform works.
Therefore, I’m in the process of putting together a series of YouTube videos on the subject. Some are taken from the relevant parts of the Fourier Transform course, but others are totally new. For example, I recently produced a video answering the question “Why is ‘i‘ the square root of minus one?” In the video, I use geometry to try and explain the link between the imaginary number ‘i‘, a 90° rotation and how, squaring ‘i‘ gives a negative result.
I’m now working on the next video which explains how to understand the output of the FFT. The FFT gives its results as a list of complex numbers and quite a few people have asked me how to convert this list into the frequency, amplitude and phase information that they expect. I aim to explain all that in the video. The video currently has the working title of: “How to understand the output of the FFT.” However, if anyone can think of a more catchy and search-engine-optimized title for the video, I’d love to hear your suggestions. Please email them to me at [email protected].
You can find my YouTube channel which you can access by going to: https://howthefouriertransformworks.com/youtube.
Conclusion
So that’s it for this episode.
To find out when any new video, podcast episode, product or any piece of Fourier Transform related content is available, please subscribe to the mailing list at: https://howthefouriertransformworks.com/mailing-list.
Alternatively you can keep up with all the latest content while at the same time helping me to produce it, by becoming a Patron of the course. This will grant you access to the second half of the course while it is still in development as well as additional video material and for the 10 dollar tier and above, access to the eBooks when they are published. Simply head over to https://howthefouriertransformworks.com/Patreon.
Please keep your requests and suggestions coming in and if you a re doing something interesting with the Fourier Transform and would like to share it with me then I’d love to hear from you.