M 391C - Wavelets: Theory and Applications
Fall 2001
Final Project: An Inadvertant Audio Processor
(or Why You Should Use Matlab If You Want Clean Results)
I had several exciting ideas for a final project in wavelets.
I knew that a few were probably too large in scope, but I never
assumed that all of them were. I plan to give you a quick list of
some of the ideas I had. I actually did get a project (of sorts)
near completion and I will tell you about that, along with the
problems (read "learning experiences") that I had with it.
Finally, I do have some sound files for your consumption,
but I will keep the best for last.
1.0 Ideas
The grand idea that I had was to create a biorthogonal wavelet
transform, then using simple thresholding compression study whether
spline scaling functions or their dual filter counterparts (or
neither) were better suited for the analysis portion of the
wavelet transform. This seemed pretty straightforward, except that I
wanted to code all of it by hand in C++ in order to see what was
really happening.
Once I started looking at this problem, I realized that the first step
should be to get an orthonormal wavelet transform working which I
could then modify to do the biorthogonal wavelet transform. I set
forth upon this task first. After working on this for a while, I
realized that I probably needed to scale back the project just a bit.
I could explore how different threshold levels affected the error on
different types of audio signals including speech, classical music,
and pop music.
After I had continued with this project for a while, I realized that
I wasn't even going to get the full wavelet transform coded
completely. This was a big disappointment to me, because now I wasn't
going to be able to really explore thresholding like I wanted.
2.0 Achievement
What I do have is a program that pulls in a soundfile, generates a set
of wavelet coefficients (currently using a D4, D10, or D20 wavelet),
thresholds those coefficients, resynthesizes an audio signal, and
puts that into another soundfile. If everything worked as hoped,
with a threshold of 0.0, the output soundfile would be an exact
copy of the input soundfile. This does not happen, so what I have
created is an audio effects processor.
Distortion enters into the system, but unfortunately, I have not been
able to track down where the problem is. One very important aspect of
this distortion, however, is that it seems to be based on musical
harmony (or octaves). Distortion from most other sources would not
sound nearly as musical. I shouldn't have been surprised, since
wavelets are octave based, but it was surprising to actually hear it.
3.0 Points of Interest
3.1 Construction
I made two personal discoveries while coding this project.
First, the analyzed sound and the synthesized sound are probably going
to be of different lengths due to the continual division by
two. The problem is obvious when considering a signal of odd length.
This could presumably be solved with zero-padding. The second
is that the scaling function coefficients at the coarsest level of
resolution can be assumed to be zero. This is basically stating that
there should not be any audio information below 20 Hz and even if
there is, it should not affect the audio quality of the resynthesized
audio signal. I did, of course, try it both ways to see if that was
the distortion that I was having.
3.2 Considerations
In trying to determine where I went awry, I did consider that most
errors that I could make in the coding should cause the system to
perform horribly wrong and not just add distortion. This led me to
look at the algorithms that I was using. I am using the Fast Wavelet
Transform that Dr. Gilbert gave me in class to analyze the sound file
and I am using Dr. Mallat's algorithm for synthesis. One
obvious difference is that Dr. Gilbert's algorithm incorporates the
downsampling and filtering in a single convolution process and Dr. Mallat's
separates the upsampling and filtering. Convolution is handled in a
different order as well, which may be another place I have misread one
or the other algorithms.
I did think that edge effects might have some significant repercussions,
so I zero-padded one audio signal to see if that made any
difference. It did not, so I do not believe that the distortion is
simply edge effects.
4.0 Conclusions
My first thought is that I probably should have tried to use Matlab if
my goal had been to produce stunning and beautiful results. I
probably could have completed my original project idea if I had used
Matlab. However, this project has been a learning experience, and I
am happy to have approached it the way that I have. Unfortunately,
I am not sure that "learning experience" equals "grade points."
My second thought is that using wavelets as a mechanism for audio
processing might hold some significant musical treasures. Since
wavelets are inherently octave based, sound synthesis or processing
using wavelets might have some unusually musical properties.
Appendix A. Sounds
As promised, here are some sounds to accompany what was discussed
above. All soundfiles are monaural MP3's with a high enough bit rate
that the distortion heard is only from the wavelet processing that
I did. All soundfiles are also very short.
|
original |
processed
|
|
|
w/D04 |
w/D10 |
w/D20 |
monotonic trumpet |
x |
x |
x |
x |
speech |
x |
x |
x |
x |
|
original |
thresheld (and D20)
|
|
|
w/0.01% |
w/0.1% |
w/1.0% |
w/10.0% |
speech |
x |
x |
x |
x |
x |
coefficients lost |
0% |
35% |
71% |
92% |
99% |
Appendix B. Source Code
Here is the source code as well. It is written in C++ and compiles
under linux with the libaudiofile library.
wvaudproc.zip (12k)
I did eventually get the program working. The error occurred because of a
shift in indexing between Dr. Gilbert's and Dr. Mallat's algorithms. I changed
to using Dr. Mallat's algorithms for both analysis and resynthesis and the
results were beautiful.
This page last updated on 2007-04-01.