Sound Localization.

Well, it’s elementary simple in theory, how to do sound localization based on phase difference of signals, that received by two spatially distant microphones. The devil, as always, in details. I’ve not seen any such project created for arduino, and get curious if it’s possible at all. Long story short, here I’d like to present my project, which answer this question  - YES!

Moreover, quantity  of electronics components not much differs from what I’ve used in my previous blog.  Compare two drawings, you will notice only 4 resistors and 4 electret microphones were added! All circuitry is just a few capacitors, 9 resistors, one IC and mics.  Frankly speaking, writing a remix of oscilloscope, I was testing  an arduino analog inputs, keeping in mind to use it in junction with electret microphones in other projects, like sound pressure measurements (dBA),  voice recognition or something funny in “color music” series. As they call it – “a pilot” project?.  There are some issue (simplest ever) oscilloscope has when doing fast rate sampling on 4 channels (settings 7, 8 and 9 Time/Div ) I already described, so I slightly reduce sampling down to 40 kHz here.



Note: *Hardware would be different for arduino boards based on different chips, and must include pre-amplifiers with AtMega328 uCPU. 

One more important things to mention in this short introductory, as I used FFT algorithm for phase calculation ( I like FFT very much, you probably, already notice it ),

Arduino is capable not only track a MOSQUITO flying in your room, it could tell if it’s MALE of FEMALE !!!!!

                  SOFTWARE.

  As I say above, I choose 40 kHz for sampling rate, which is a good compromise between accuracy of the readings  and maximum audio frequency, that Localizator could hear. Getting signals from two mic’s simultaneously, upper limits for audio data is 10 kHz. No real-time, “conveyor belt” include 4 major separate stages:

  • sampling X dimension;
  • FFT
  • phase calculation
  • delay time extracting
  • sampling Y dimension;
  • FFT
  • phase calculation
  • delay time extracting

4 mic’s split in 2 groups for X and Y coordinate consequently. Picking up 4 mic’s simultaneously is possible, but would reduce audio range down to 5 kHz, so I decided to process two dimension (horizontal and vertical planes)  separately, in series. Removing vertical tracking from the code, if it’s not necessary, would increase speed and accuracy in leftover plane, of course. I’d refer you for description of the first and second stages to other blogs, FFT was brought w/o any modification at all. Essential and most important part of this project, stages 3 and 4.

Phase Calculation (3).

 Mathematical tutorial on a topic, I’m not any good as a teacher, so you better read somewhere else, to brush up a basic concept. Core of the process is arctangent function. This link says a number of cycles. In two words – too slow.  LUT ( Look Up Tables ) is the best solution for no-float uCPU to do complex math extremely fast, and reasonably (?) precise. Drawback of LUT is limited size, so it could be saved in FLASH memory, which in next tern  is also limited. This is what I did on “resource management” side: 1 kWords ( 16-bit integers, 2 kBytes) , 32 x 32 ( 5 x 5 bites) LUT, scaled up to 512 to get better “integer” resolution. There are a few values in top-right corner, that melted together as their differences are less than “1″ (not shown on the picture on right side). The “worst” resolution is in top-left corner, where “granularity” is reaching 256, or unacceptable 50% of the dynamic range. To stay as far away from this corner, I put a “Rainbow Noise Canceler” – single line with ” IF ” statement, which “disqualifies” any BIN with magnitude, calculated at the FFT stage, lower than 256.

IF(((sina * sina) + (cosina * cosina)) < 256) phase = -1;

 I called it “Rainbow” because of it’s shape, “red line” is an arc, going from 16 on top line to 16 on left side. Also, “Gain Reset” – 6 bit ( depends on the FFT size, has to be 6 bits for 128) reduced to 5 bits, in order to get better sensitivity. This two parameters / settings, 5-bit and 3.5 bit magnitude limit, create a “threshold” for weak spectral peaks. Basically, depends on application, both values can be adjusted in  different proportions.

 There are two category of tracking technics, with mic’s installed on moving platform, and stationary mic’s. First one is a little bit easier  to understand and build, requires Relative direction to sound source. This what I’ve done. Stationary mic’s approach, when motors are moving laser pointer (or filming camera) alone, would require Absolute direction to sound source, and must include stage #5 – angle calculation via known delay time. Math is pretty simple, acrsine function, and at this point only one calculation per several frames would be necessary, so floating point math wouldn’t be an issue at all. No LUT, scaling, rounding/truncation. Elementary school geometry knowledge – thats all you need.

Delay Time Extraction (4).

 Subtraction phase value of one “qualified” mic’s data pull from another, produce phase difference. To turn phase difference in delay time, division by BIN number is performed. Lets call this operation “Denominator” process.  The denomination is necessary, because all data after this step going to be combine and process together, doesn’t matter of wave length, which is different for every bin. Frequency and wavelength related to each other via simple formula:  Wavelength = Velocity / Frequency, where velocity is a speed of sound wave in the air ( 340 m/sec at room temperature). As distance between two mic’s is a constant,  sound with different wavelength ( frequency ) produce different phase offset, and denomination make them proportional. (WikiPedia, I’m sure, would explain this much better, mind you, I’m a Magician, not mathematician).

First picter on right side shows  ”Nuisance 3: Incorrect arctan” correction. You will find two lines with “IF” statements in the code relaited to stage #3.

Second one,  gives you idea why other correction at stage #4 is necessary As you can see, subtraction one arctan from another generates a rectangular “pulse” ( Diff. n. corr., violet line) whenever one function changes sign but other (delayed version) not yet. Light blue line (DIFF(B)) doesn’t have such abnormality. Math is simple, just two lines with “IF’s” in the same manner, only “double size” constants this time. 2048 on my scale corresponds to 2 x PI, 1024 – PI, and 512 – PI / 2.

Arduino has only 1 ADC, so there is always constant delay time equals to one sampling period ( T = 1/40 kHz = 25 usec), which also should be subtracted ( or added, depends how you associate input 1 and 2 – left / right side mic.)

Filtering.

 To fight reverberation and noise, I choose a Low Pass Filter, which I’d call here as a “Rolling Filter”. My research with regular LPF, shows that this class of filters is completely NOT appropriate for such type of data, due their high susceptibility to “spikes”, or sudden jump in magnitude level. For example, when system getting steady reading from 2-3 test frequencies with low values, let say -10, simple averaging ( should be -10 ) results will be corrupted with one accidental spike (magnitude +2000) during next 60 – 100 consecutive frames !!!  The Median Filter doing well eliminating sudden spikes, the same time is very hungry to CPU cycles, as it’s using “sort” algorithm each time new sample was arrived to the data pull. Having 64 frequencies, and setting filter kernel to 5 – 8 samples, arduino would be buried doing sorting at almost 40 ksps.  Even processing each frequencies data not individually,  and sorting only one 64 elements array still very time consuming job.  After thinking a while, I came up to conclusion, that “Rolling Filter” has almost the same efficiency as Median, but instead of “sorting” requires only 1 additive operation! On long run, the output value will “roll” and “stick” to the middle of the pull. ( Try to model it in LibreOffice. )  Adjusting “step” of the “Rolling Filter”, you can easy manipulate responsiveness,  which is almost impossible with Median Filters. (Things TO DO: Adaptive Filtering, real time adjustment depends on input data “quality”).

 To be continue…. Video will follows !

( Predicting your question, how “good” is localization?  Its about same, as Laser TRF (tracking range finder) has, look at other blogs for now to get impression.  In other words: ASTONISHINGLY GOOD,   a few (1 – 5) angular degree in closed environment.

Link to Arduino (Leonardo) sketch:  Localizator-beta-9.


[original story: coolarduino]