> aaahome courses publications research resume software <

M 391C - Wavelets: Theory and Applications

Fall 2001

Final Project: An Inadvertant Audio Processor
(or Why You Should Use Matlab If You Want Clean Results)

I had several exciting ideas for a final project in wavelets. I knew that a few were probably too large in scope, but I never assumed that all of them were. I plan to give you a quick list of some of the ideas I had. I actually did get a project (of sorts) near completion and I will tell you about that, along with the problems (read "learning experiences") that I had with it. Finally, I do have some sound files for your consumption, but I will keep the best for last.

1.0 Ideas

The grand idea that I had was to create a biorthogonal wavelet transform, then using simple thresholding compression study whether spline scaling functions or their dual filter counterparts (or neither) were better suited for the analysis portion of the wavelet transform. This seemed pretty straightforward, except that I wanted to code all of it by hand in C++ in order to see what was really happening.

Once I started looking at this problem, I realized that the first step should be to get an orthonormal wavelet transform working which I could then modify to do the biorthogonal wavelet transform. I set forth upon this task first. After working on this for a while, I realized that I probably needed to scale back the project just a bit. I could explore how different threshold levels affected the error on different types of audio signals including speech, classical music, and pop music.

After I had continued with this project for a while, I realized that I wasn't even going to get the full wavelet transform coded completely. This was a big disappointment to me, because now I wasn't going to be able to really explore thresholding like I wanted.

2.0 Achievement

What I do have is a program that pulls in a soundfile, generates a set of wavelet coefficients (currently using a D4, D10, or D20 wavelet), thresholds those coefficients, resynthesizes an audio signal, and puts that into another soundfile. If everything worked as hoped, with a threshold of 0.0, the output soundfile would be an exact copy of the input soundfile. This does not happen, so what I have created is an audio effects processor.

Distortion enters into the system, but unfortunately, I have not been able to track down where the problem is. One very important aspect of this distortion, however, is that it seems to be based on musical harmony (or octaves). Distortion from most other sources would not sound nearly as musical. I shouldn't have been surprised, since wavelets are octave based, but it was surprising to actually hear it.

3.0 Points of Interest

3.1 Construction

I made two personal discoveries while coding this project. First, the analyzed sound and the synthesized sound are probably going to be of different lengths due to the continual division by two. The problem is obvious when considering a signal of odd length. This could presumably be solved with zero-padding. The second is that the scaling function coefficients at the coarsest level of resolution can be assumed to be zero. This is basically stating that there should not be any audio information below 20 Hz and even if there is, it should not affect the audio quality of the resynthesized audio signal. I did, of course, try it both ways to see if that was the distortion that I was having.

3.2 Considerations

In trying to determine where I went awry, I did consider that most errors that I could make in the coding should cause the system to perform horribly wrong and not just add distortion. This led me to look at the algorithms that I was using. I am using the Fast Wavelet Transform that Dr. Gilbert gave me in class to analyze the sound file and I am using Dr. Mallat's algorithm for synthesis. One obvious difference is that Dr. Gilbert's algorithm incorporates the downsampling and filtering in a single convolution process and Dr. Mallat's separates the upsampling and filtering. Convolution is handled in a different order as well, which may be another place I have misread one or the other algorithms.

I did think that edge effects might have some significant repercussions, so I zero-padded one audio signal to see if that made any difference. It did not, so I do not believe that the distortion is simply edge effects.

4.0 Conclusions

My first thought is that I probably should have tried to use Matlab if my goal had been to produce stunning and beautiful results. I probably could have completed my original project idea if I had used Matlab. However, this project has been a learning experience, and I am happy to have approached it the way that I have. Unfortunately, I am not sure that "learning experience" equals "grade points."

My second thought is that using wavelets as a mechanism for audio processing might hold some significant musical treasures. Since wavelets are inherently octave based, sound synthesis or processing using wavelets might have some unusually musical properties.

Appendix A. Sounds

As promised, here are some sounds to accompany what was discussed above. All soundfiles are monaural MP3's with a high enough bit rate that the distortion heard is only from the wavelet processing that I did. All soundfiles are also very short.

original processed
w/D04 w/D10 w/D20
monotonic trumpet x x x x
speech x x x x

original thresheld (and D20)
w/0.01% w/0.1% w/1.0% w/10.0%
speech x x x x x
coefficients lost 0% 35% 71% 92% 99%

Appendix B. Source Code

Here is the source code as well. It is written in C++ and compiles under linux with the libaudiofile library.

wvaudproc.zip (12k)

I did eventually get the program working. The error occurred because of a shift in indexing between Dr. Gilbert's and Dr. Mallat's algorithms. I changed to using Dr. Mallat's algorithms for both analysis and resynthesis and the results were beautiful.

This page last updated on 2007-04-01.