Posts with «c program» label

Speech / Voice Recognition – remix.

It’s about right time to release one more “remix” for one of my blog, published almost a year ago. I haven’t got much comments, not many as I was expecting, on a topic. There are some reasons, that would explain this phenomenon, but I would better to start outlining what I did in new release, and some of you who tried old version would be impressed by progress I’ve made!

Basic structure was left almost intact. Essential parts of the project: Filtering 2D and Cross-Correlation are the same, so please read my old post, if you come across this one w/o seen it first. What differs is “preprocessing”, before we get to the filtering stage. In first, I ported a code on Leonardo board.  The easiness of connecting electret mic to Leonardo, just didn’t give me a choice!  I played already with Leonardo ADC – electret mic’s in my previous post, and would assure you, that this guys were designed to work as a team. Uno followers have to solder a pre-amplifier, not big deal, the same time not really interesting.  Code would run on Uno, except Timer and ADC settings, which you could always “copy/paste” from the old version. Feel free, be my guest.

Analog front-end is absolutely the same as I used in Sound Localization project. You would need only one mic here, just reduce the number of electrical components down.

Sampling subroutine based on Timer 4, and arduino Leonardo internal PGA set to gain x40. There are a comments in the code, so you could adjust gain up or down depends on the sensitivity of mic you have.

Windowing LUT is slightly modified Hamming/Hann cosine function,  I’d say my table is an “intermediate” version of both mention inventors.

FFT is my best achievements of this year, RADIX-4.  Compare to old code, about 3x faster. This is why I was able to increase FFT_SIZE up to 128,  still having plenty of time.  Magnitude calculation is based on approximation, very fast, because no square root extraction required. Accuracy, in the worst case scenario ~95%, which is more than enough in this project.

I changed Non-Linear compression algorithm, as now there are more Bins – 64 to pack in 16 Bands. Math is simple, and hope doesn’t need an extensive comments. Packing is necessary due memory limits, 1 sec password in current configuration (16 bands with sampling rate 4 kHz) occupied 1 kByte, full size of EEPROM on Leonardo or Uno boards.

Command Line Interface is preserved. Here is the instructions, how to set everything up and running.

And here how spectrogram looks like in LibreOffice, test phrase “Front right”, (OS Linux Ubuntu, 12.04):



 Get your microphone wired / connected? All checked at least a couple times with multimeter, voltages looks o’k? Good, now you are ready to start! ( Don’t forget to upload a sketch to your arduino / Leonardo -);

1. X. First of all, open serial monitor window, and check boadrate.
Should be 115200. Next, type “x” and Enter. Get some response? Does
it look like a table? Excellent, data you are looking at is “raw” 
sampling data. Probably, just noise, acoustical or electrical. 

2. F. Second test, type “f” - Enter. Again, arduino would print out 
a table. This time data represent “processed” by FFT analog signal. 
Each bin corresponds to frequency range 32 Hz. If you have a signal
generator , it's right time to do a detailed check up microphone, 
wiring, and software. If don't have, you can use your computer's 
sound card and some program, there are plenty of them available 
on-line free of charge. Connect generator to PC USB speackers, and
run a single tone, anything in audible range 16 – 2000 Hz. Check 
again using “f” command, if arduino registers a signal, and it's in
right bin. Due “windowing” function, even pure single tone would 
show up at least in 2-3 neighboring bins. Amplitude depends on 
sensitivity of the mic, and volume of the sound. Sending “x” you can 
confirm, that there is no “clipping” too many “– 511” or “+511”.

3. S. Next, if all goes well till this step, you probably already
notice, that whenever a mic picking-up a sound – yellow on-board LED
lights up. Now, it's better shutdown your TV, iPad, radio. 
Make your environment as quite as possible. Try to say something 
with your own voice, and see if led lighting up when you start 
talking, and than after ~1 sec it goes off. Repeat a few times, 
adjusting a volume and / or distance to the mic, so led goes “on/off”
reliable. Send “s” command when led is off. Don't worry, it should 
be no report on screen, till you say a word. Procedure is simple, 
send “s” - say something. After you talk, and get a printing, look 
carefully. Your objective, is get a “spectrogram” which consist of 
a few spots / blobs of digits, randomly distributed all over the 
“surface”. Repeat a few times with one word, than try another one 
and so on. With short words, very likely reporting data would be 
concentrated at the top ( you may need to scroll up to see the 
beginning), if this is a case, better to choose longer words or talk
in slower tempo. Just remember, there are three dimension, time, 
frequency and volume. You can use a signal generator here again, 
spectrogram should looks like a vertical line, may be 2 - 3 parallel
lines on low frequencies test tones. 
 Changing a frequency, you can even get some curves. What is 
important, the must be no negative numbers. If you see them – 
"overloading" happens, Decrease a volume. Dynamic range is limited 
by 127, more close you can get to this value w/o negatives, 
the better. And last dimension is a frequency. For man, it would 
be a little bit hard to “detach” a spectrogram from the left border.
It's true for everyone, who is not opera singer, me no exceptions.... 

4. R and G. After you practice enough, and received Manny nice 
looking spectrograms, it's time to check with arduino, if it's 
agree with you / thinks similar. Send “r” - recording, and say 
a word you've get your best spectrum with. Wait a few seconds, 
writing to EEPROM takes time. Now send a “g” and say the same word.
In the same manner, tonality and volume. See on the outputs, what 
is the cross-correlation factor you received. More than 50% - very 
good for beginning. Less – try again a few more times sending “g” and
repeating same word, maybe slightly varying pronunciation. Try to 
reach best recognition, “crack” your own password code! 
( note: Negatives number on “G” reports-form must be present.)
If no luck try another password code. Now you know the drill: 
S – repeat, repeat, repeat....., R, G – repeat, repeat, repeat..... 
(joke). Can't get good match? Try your computer's test sounds, 
beeps, horns, clicks, barks – whatever your OS has. It doesn't have
to be shorter than 1 sec, but only 1 second in the beginning 
would be stored / compared. My computer is able to repeat the same 
sound track (speakers test - "front right") with enormously high 
cross-factor 99 % ! I'm not a computer, my best short 86 % so far...

5. P. This command is simply reading the content of the EEPROM, 
so you always can verify, what you stored last time. 
Editing / formatting this data you can store a table in the arduino 
FLASH memory using PROGMEM. About 10 – 15 commands. Of course, 
storing data in external SD card or EEPROM, could greatly increase
the “vocabulary”, the same time design of fast cross-correlation 
algorithm with multiple “pattern”s would be another brain teasing
puzzle -);.

Have fun!

Link to arduino sketch, VOR (VOice Recognition).


Visual Navigator. Making it MOBILE !

Obstacle avoiding vehicle, continue in “3D Laser Range Finder” series ( project 1, project 2). The basic idea is the same, measuring distance using red laser pointers, CCD analog camera and Arduino UNO.  Modification was made in geometry.  Two lasers were set for “far field” obstacle detection, few meters in front of vehicle on left or right side. Primary mission is to trigger left / right turn before a car get too close to the “continuous” but not necessarily “high”  object, for example, sidewalk stone. Of course, this distance depends on the vehicle speed, and “alert” should be dispatched in right time “window”, or there would be no space left to making a turn ( proportional speed adaptation is not implemented yet). Low height of such road infrastructure is making useless ultrasound based range finder.


Two additional lasers were set in “cross” configuration, in order to detect any object that comes dangerously close to the front of vehicle. “Near field” obstacle detection or “head on collision” avoidance. Theirs two beams form reflective “trip-wires” and able to detect as narrow object as leg of a chair or desk, open door frame, anything that at least 1 mm wide.  One laser, pointed to the left, is also works as sidewalk / wall follow navigation system, keeping this distance constant.

Now couple words on “autopilot” algorithm. Three main feature of the project:

  1. wall / sidewalk following;
  2. “far field” obstacle avoidance;
  3. “near field” head on collision avoidance.

were classified in 3 priority levels: 1 – warning, 2 – major, 3 – critical.

0 – clear level, corresponds to normal  R/C radio control, or by  ”man / operator”  navigation via  remote R/C module. Operator is also has “authority” to decline warning class navigator status. But it’s not the case when navigator’s “autopilot” subroutine performs class 2 or 3 maneuver, with status “major” and “critical”. When vehicle performs maneuver 2, “left / right” command from R/C remote module are ignored, the same with “forward / backward” command in status “3 – critical”, making algorithm completely “fool – proof”.

More video will be posted, Link to Arduino UNO sketch: Visual_Navigator.

 5 August 2012.

I’d like to publish more pictures from “inside”, which show interface between arduino and R/C receiver module in the car. Well, not quite arduino, I build a “clone” using pre-programmed AtMega328. As you can see, the receiver was left almost intact, what I did, is just identified two on-board H-bridges which supply power to steering control motor and main vehicles motor-driver. Than, remove 4 resistors in series with controls lines, and routed 8 wires to the arduino ( 4 inputs from R/C receiver and 4 outputs to H-bridges ). Here you are, now arduino could intercept any command coming from R/C transmitter, and based on data from the sensors, make a decision if it makes sense to follow them. Also, “autopilot” function could “directly” address two motors in order to execute “obstacle avoiding” maneuver not asking anyone’s permission!.  What more, arduino control a power delivered to motors via software PWM,  making 7! different speed level available like in real vehicle. Unfortunately, the model I “hack” doesn’t use proportional steering control, but still PWM power management helpful to save a battery energy, limiting unnecessary current delivered to motor.


Speech / Voice Recognition. Arduino project, next in a series FFT and Arduino.

 Finally, I’d like to present  the most sophisticated project I’ve done so far, build around the idea turning Arduino board into a DSP.  The results are really impressive for small microprocessor, with low memory size and low MIPS. IMHO, arduino provides better results, than Windows Vista VR system, with 1 GB / 2.2 GHz  hardware, for short one-two words commands, of course.
No HMM, neural networks, or other very popular and “scientifically sounding” theories, were considered to be implemented in the algorithm. Google brings up  millions links on a topic, just ask, but only few of them are designed on really scientific concept, rather than dumb data base “sharpening”. I’m not saying they are completely wrong, and I’m not an expert in the field, but they are not smart ether. My decision is simple 2D cross-correlation. Basically, the heart of the recognition algorithm is similar to an image matching program, which works the same way for voice/sound.  To create a Spectrogram image, arduino is continuously monitoring sound level via microphone, and start capturing data when VOX threshold is exceeded. After input array “X” filled up, data transfered on next level to calculate FFT. The same “conveyor belt” works between FFT and Filtering, flags raised when data is ready, and flags lowered when process finished. The only difference is a speed, conveyor belt is running faster passing data ADC-FFT, and slower at Filter-Correlation stage, as it requires 64 regular cycles to complete spectrogram image in one SuperCycle.  The most time consuming part is Edge Enhancement / HPF Filtering of the spectrogram. I’m still looking around to improve performance of this stage, as it holds all process back from to be fully “Real Time”.
-  4 kHz sampling rate:  2 kHz voice freq. range;
-  64 FFT subroutine,    62.5 Hz spectral resolution;
-  16 x 64 Spectrogram Image, around 1 second max voice password;
-  duration of the Cross-Correlation < 5 milliseconds;
-  duration of the FFT+SQRT+Compression < 4 milliseconds;
-  duration of the Edge Enhancement ~ 35 milliseconds;Main cycle time frame is 16 milliseconds, it’s defined by sampling rate x FFT size, 0.25 x 64 = 64 millisecond. Super-cycle 1.024 is needed only because EE prevents all processes to be completed in less than 16 milliseconds. There is a resources left, to increase sampling up to 8 or even 12 kHz, I just had no time to conduct experiments if it is beneficial.

There is a Command Line Interface, built-in the software, which control “record” and debug “print” functions, 7 commands for now:
if (incomingByte == ‘x’) {           // INPUT ADC DATA
if (incomingByte == ‘f’) {           // FFT OUTPUT
if (incomingByte == ‘s’) {           // SPECROGRAMM PRE  FILTERED
if (incomingByte == ‘g’) {           // SPECROGRAMM POST FILTERED
if (incomingByte == ‘r’) {           // RECORD SPECROGRAMM TO EEPROM
if (incomingByte == ‘p’) {           // PLAY SPECROGRAMM FROM EEPROM
if (incomingByte == ‘m’) {           // FREE MEMORY BYTES

Software is written for AtMega328p microprocessor, Arduino Uno board or similar. For others, all referenced registers has to be replaced with appropriate names for microprocessor.Compiles on 022 IDE, there are some conflicts with 1.0 IDE, that I was not feel myself right to troubleshoot yet. For better understanding some math background, have a look at my previous posts.

Link to download a sketch:   Voice_Recognition_24_01

Analog front-end is the same, as I used in my first project: Color Ogran
There is not much could be improved on this part, and I again used both inputs – from microphone to do tests with my own voice, and also from “line” input, for single tone test generated by computer during debugging. Next picture shows “s” command print-out in the serial monitor window, after I pronounce a word : “Spectrogram” . Due limited size of the window, data printed with 90 degree rotation, left-right is frequencies bands direction, and up-down is time. Lower freq. on left side (60 Hz) and higher (2 kHz) on the right.  The same time 3D images generated in right view angle.

This is how spectrogram looks like after “g” command entered in serial monitor and word sounds just right after that:

Next couple images created with single tone frequency  (320 Hz), just to show more clear “internal properties” of the filtering, again “s” and “g” commands were entered:

Well, as tone sounds continuously, it shows filtering in one direction only, and not the best tutorial on edge-enhancement theory. (“Home brew” lab limits). The same time last picture shows, that each “peek” on the original spectrogram, become surrounded by negative smaller peeks, resulting in “0″ overall sum  on 3×3 foot-print, and consequently on the whole map. In electronics it goes under HPF name, and essence of process is to remove DC component, plus attenuate  Low Frequencies.
Excelent on-line book

Short manual:
to be completed later

Optical Magnet, Arduino project next in a series Laser Tracking 3D

This blog considered to be next stage in the series published earlier, concentrated around the idea tracking object in the space. There are an enormous quantity of similar projects could be developed on this platform. I’ll name a few:

- star / rocket / vehicle tracking;
- follow me / robot / hands / cap etc;
- navigation to charging station / landing pad / around area;
- contact-less measurements rotational speed / angle to surface / shifting.

All of this on a few dollars micro-controller and cheap CMOS camera! Real-time, up to 60 Hz update rate!
Please, check on the first and second version, as I’d skip explanation basic design concept here.

  Most important features in this series of projects:

* 3. TRACKING 3-D.
* 5. TRACKING 6-D.

Feature considered to be independent, so you can star  from  project 1:
than move on next stage, and so on depends on a budget, parts availability or your interest!

This version of hardware/software system design capable to track object in

6 – D ++ space:

*   -  Linear motion along X, Y, Z coordinates (3D);
*   -  Rotation around fixed axis (6D);

I put two ++ plus signs, in order to underline capability of the hardware design to track Rotation of the object based not only on distance measurements, but also Reflectivity. As Power Control Loop strictly hold lasers radiation under control, simple calculation in periodicity of the emitted power would provide information about angular speed round / cylindrical object. Phase difference in 4 signals gives rotational center for Z. It’s also apply for linear motion, tracking of the object could be based on reflectivity or distance or BOTH simultaneously ,  which opens enormously great amount of possibilities.

Optical Magnet:
*  – Attract closest surface;
*  – Repel;
*  – Attract or repel surface with specific reflectivity (BLACK, WHITE, OR COLOR)!!!
*    * work in progress, Reflectivity math are not implemented yet       *
*    * algorithm to track rotation is not included, this is version for demonstration purposes  mainly.*

Link to download Arduino Uno sketch:  Optical_Magnet_6D

8 January, 2012.
Release notes of the version 3.2 software:

-  Video is De-interlaced, full image size 512 (active 492) lines;
-  digitalWrite, analogWrite functions of the arduino IDE were replaced by direct port manipulation, in order  to improve time performance of critical section of the code – interrupt subroutine and to avoid blocking interruption call (functions have locking mechanism in theirs body);
- minor changes in Power Control Loop algorithm, to prevent oscillation;

Link to download Arduino Uno sketch:  Optical_Magnet_6D_V3


Arduino Musical Note Recognition – Pushing the limits.

Second project based on Arduino Uno and FFT code.    (First one : Project 1.)
Short description ( EDITED: Project stopped. I lost my inspiration and working on visual recognition now, I decided to publish a code, so someone else could find it useful and continue research. ).

Main array size is 1024 bytes, real / imaginary 512 / 512, output 256 bins (only half real part, other half is mirror).  Sampling rate 4 kHz, upper note B6 (1975.5 Hz) on yellow line – right side led, lower note C3 (130.8 Hz) on red line – left side led.  Frequency resolution is approximately 7.8 Hz per bin. Processing and sampling are running in parallel. Sampling array size is 256. After data captured (256 x 0.25 msec = 64 msec) they transfered to processing array 1024, missing 256 real samples is “zero padded”, and sampling continue w/o interruption.  I did zero padding on a purpose to get “response time” of led matrix as fast as possible, so real-time 1/16 notes could be visually distinguished the same time,  as they played.

After computational cycle is completed (~36 msec), main program executes “cognitive core function” to differentiate between notes / tones, that have to be displayed on LED matrix.

 In order to minimize error rate of this process, Masking Shadow Theory (MST) was developed. Masking shadow for each note is calculated in several steps and result is compared with notes magnitude in the cycle. If magnitude is less than shadow, than led corresponding to this note wouldn’t  lights up. There are five steps for now, but as I say, work in progress. 
Step 1 (masking shadow 0):  noise floor, which is common for all notes. 
Step 2 (masking shadow 1):  shadow from neighboring notes, that includes 8   notes on left and right side, in inverse proportion to their distances. 
Step 3 (masking shadow 2):  shadow from the note, which is located 1 octave below. Or in other words, cross check if current bins value isn’t second harmonic from sounding note 1 octave below  it. 
Step 4 (masking shadow 3):  similar to step 3, the only difference is, cross check if current bin (note) isn’t third harmonic from sounding note below  it. 
Step 5 (masking shadow 5):  similar to step 3 and 4,  cross check if current note isn’t fifth harmonic from sounding note below  it.

Masking shadow is multiplied by notes specific coefficients after steps 3 – 5, to accommodate significantly richer spectral content for lower octaves.  Formula for calculation of the coefficients, is the trickiest   part of all project. What I’ve discovered, coefficient not just varying between notes / tones, they dynamically varying during “life-time” of the tone.

25 Sept. 2011 

There are two variables have been considered: speed, octave range. Third one, “not technical” – is a price for the project. 
 Octave range is defined by  RAM memory available on chip. 2K on UNO. As maximum processing array size is must be power of 2 ( FFT Radix-2 ), array couldn’t be more than  1024 bytes, next value – 2048 is size of all RAM, that obviously couldn’t be taken. Size of array defines maximum quantity of frequency bins at the output 1024 / 4 = 256. Divided by 4 as there is real / imaginary part, and only half real part is present data. What is interesting, that musical octave is nothing else than doubling of the tones frequency, so it follows binary arithmetic rules… If I will count from high side, the upper octave would occupied half of all 256 bins, from bin 256 to bin 128. Simply because frequency / musical tone spaced logarithmically (LOG_2), and bins spaced equally. Next octave takes half what left over, from bin 128 to bin 64. Third octave 64 to 32, and fourth 32 to 16. Can I go more down ? No. There are 12 notes in each octave. It means, that after fourth octave counting down , I arrived to location where bins and tones spaced almost in sequence, bin 16 – C3, bin 17 – C3#, bin 18 – D3. Well there is four more ( 15, 14, 13, and 12 ), but it doesn’t change much, as it only 1/3 of octave and only would complicate multiplexing LED display.
This is why variable octave range = 4. Summing up, increasing octave range by 1 ( to 5 octaves ) would require double memory size (2K processing array, still possible with chip 4K), by 2 ( to 6 octaves ) – four times more memory (4K processing array).

Arduino mega board has 8K RAM, would it be better to design project with it? In first, it cost more money. In second, it has the same CPU performance, and as you will see below, to have more memory w/o faster CPU doesn’t make any sense. CPU wouldn’t be able to process bigger volume of data in time.

Now lets have a look at speed variable. Following math (and design itself), is greatly depends on it. In order to get at least 1/16 note to be visually “alive”, all cycle ( sampling , pre-processing, FFT, post-processing ) has to be completed for less than 64 msec. My impression is, when timing a little bit longer, LED display looks like it shows something, that was played last Saturday night. Invisible real-time connection between “light” and “music” become broken.
Octave range variable ( 4 octave to be specific ), especially low notes starting from C3, begging for 8 Hz resolution, as it equals to distance between C3 and C3#. And consequently, for 128 msec sampling frame duration. So, there is a contradiction. To solved this , zero padding was introduced, which help to keep sampling window down to 64 msec, the same time frequency resolution not very far from 8 Hz. To make real-time life show, sampling must continue w/o interruption. Even more, it has to be “overlapped”, as pre-processing (windowing) would cut off beginning and ending of the sampling pull. All three other functions ( FFT, pre- and post-processing ) have to go in parallel and must be completed in the same time frame 64 msec. Arduino platform has 8-bit microprocessor, with low horse power engine under hood. This is why 8-bit math was selected instead of 16-bits ( which would save me a lot of troubles ). Troubles, I’m talking about, are very low dynamic range when integer math and 8 – bit comes together in FFT. Integer math, which gives nice time performance, puts really hard constrain on dynamic range, just because it performs “scaling” before and after every “butterfly”. And every scaling procedure brings in rounding error, which grows enormously, as there are 2304 butterfly (9216 round operation) for N = 512 FFT. Special attention must be payed, to keep rounding error under control, the same time not to increase calculation time too much or integer math would not make any sense. What I find out, there is  an excellent algorithm to make “symmetrical” 1/2 bit rounding, but it almost doubles calculation FFT timing, which I obviously, could not afford. So, I choose other path, to increase dynamic range. Compression algorithm on the input data. The easiest way to do it, is “clipping”. Set couple lines in the code:
             if ( x[i] >  127 )  x[i] =  127; 
             if ( x[i] < -127 )  x[i] = -127; 

and all good. Not quite. It will do a great job for any other signal (vibration from accelerometer for example), except music…..
Clipping generates a very high level of harmonics, and ones again , it couldn’t be afford, as it just undermine basic idea of the project – MUSICAL note recognition.
 Summary: Scaling extends dynamic range of integer FFT on 24 dB ( 4 bits ).

8- bit FFT dynamic range is +36 dB;
scaling                               +24 dB;
noise                                 -   3 dB;     
Overall                                 57 dB.   

Link to download a scketch: