Posts with «c code» label

Sound Localization.

Well, it’s elementary simple in theory, how to do sound localization based on phase difference of signals, that received by two spatially distant microphones. The devil, as always, in details. I’ve not seen any such project created for arduino, and get curious if it’s possible at all. Long story short, here I’d like to present my project, which answer this question  - YES!

Moreover, quantity  of electronics components not much differs from what I’ve used in my previous blog.  Compare two drawings, you will notice only 4 resistors and 4 electret microphones were added! All circuitry is just a few capacitors, 9 resistors, one IC and mics.  Frankly speaking, writing a remix of oscilloscope, I was testing  an arduino analog inputs, keeping in mind to use it in junction with electret microphones in other projects, like sound pressure measurements (dBA),  voice recognition or something funny in “color music” series. As they call it – “a pilot” project?.  There are some issue (simplest ever) oscilloscope has when doing fast rate sampling on 4 channels (settings 7, 8 and 9 Time/Div ) I already described, so I slightly reduce sampling down to 40 kHz here.



Note: *Hardware would be different for arduino boards based on different chips, and must include pre-amplifiers with AtMega328 uCPU. 

One more important things to mention in this short introductory, as I used FFT algorithm for phase calculation ( I like FFT very much, you probably, already notice it ),

Arduino is capable not only track a MOSQUITO flying in your room, it could tell if it’s MALE of FEMALE !!!!!

                  SOFTWARE.

  As I say above, I choose 40 kHz for sampling rate, which is a good compromise between accuracy of the readings  and maximum audio frequency, that Localizator could hear. Getting signals from two mic’s simultaneously, upper limits for audio data is 10 kHz. No real-time, “conveyor belt” include 4 major separate stages:

  • sampling X dimension;
  • FFT
  • phase calculation
  • delay time extracting
  • sampling Y dimension;
  • FFT
  • phase calculation
  • delay time extracting

4 mic’s split in 2 groups for X and Y coordinate consequently. Picking up 4 mic’s simultaneously is possible, but would reduce audio range down to 5 kHz, so I decided to process two dimension (horizontal and vertical planes)  separately, in series. Removing vertical tracking from the code, if it’s not necessary, would increase speed and accuracy in leftover plane, of course. I’d refer you for description of the first and second stages to other blogs, FFT was brought w/o any modification at all. Essential and most important part of this project, stages 3 and 4.

Phase Calculation (3).

 Mathematical tutorial on a topic, I’m not any good as a teacher, so you better read somewhere else, to brush up a basic concept. Core of the process is arctangent function. This link says a number of cycles. In two words – too slow.  LUT ( Look Up Tables ) is the best solution for no-float uCPU to do complex math extremely fast, and reasonably (?) precise. Drawback of LUT is limited size, so it could be saved in FLASH memory, which in next tern  is also limited. This is what I did on “resource management” side: 1 kWords ( 16-bit integers, 2 kBytes) , 32 x 32 ( 5 x 5 bites) LUT, scaled up to 512 to get better “integer” resolution. There are a few values in top-right corner, that melted together as their differences are less than “1″ (not shown on the picture on right side). The “worst” resolution is in top-left corner, where “granularity” is reaching 256, or unacceptable 50% of the dynamic range. To stay as far away from this corner, I put a “Rainbow Noise Canceler” – single line with ” IF ” statement, which “disqualifies” any BIN with magnitude, calculated at the FFT stage, lower than 256.

IF(((sina * sina) + (cosina * cosina)) < 256) phase = -1;

 I called it “Rainbow” because of it’s shape, “red line” is an arc, going from 16 on top line to 16 on left side. Also, “Gain Reset” – 6 bit ( depends on the FFT size, has to be 6 bits for 128) reduced to 5 bits, in order to get better sensitivity. This two parameters / settings, 5-bit and 3.5 bit magnitude limit, create a “threshold” for weak spectral peaks. Basically, depends on application, both values can be adjusted in  different proportions.

 There are two category of tracking technics, with mic’s installed on moving platform, and stationary mic’s. First one is a little bit easier  to understand and build, requires Relative direction to sound source. This what I’ve done. Stationary mic’s approach, when motors are moving laser pointer (or filming camera) alone, would require Absolute direction to sound source, and must include stage #5 – angle calculation via known delay time. Math is pretty simple, acrsine function, and at this point only one calculation per several frames would be necessary, so floating point math wouldn’t be an issue at all. No LUT, scaling, rounding/truncation. Elementary school geometry knowledge – thats all you need.

Delay Time Extraction (4).

 Subtraction phase value of one “qualified” mic’s data pull from another, produce phase difference. To turn phase difference in delay time, division by BIN number is performed. Lets call this operation “Denominator” process.  The denomination is necessary, because all data after this step going to be combine and process together, doesn’t matter of wave length, which is different for every bin. Frequency and wavelength related to each other via simple formula:  Wavelength = Velocity / Frequency, where velocity is a speed of sound wave in the air ( 340 m/sec at room temperature). As distance between two mic’s is a constant,  sound with different wavelength ( frequency ) produce different phase offset, and denomination make them proportional. (WikiPedia, I’m sure, would explain this much better, mind you, I’m a Magician, not mathematician).

First picter on right side shows  ”Nuisance 3: Incorrect arctan” correction. You will find two lines with “IF” statements in the code relaited to stage #3.

Second one,  gives you idea why other correction at stage #4 is necessary As you can see, subtraction one arctan from another generates a rectangular “pulse” ( Diff. n. corr., violet line) whenever one function changes sign but other (delayed version) not yet. Light blue line (DIFF(B)) doesn’t have such abnormality. Math is simple, just two lines with “IF’s” in the same manner, only “double size” constants this time. 2048 on my scale corresponds to 2 x PI, 1024 – PI, and 512 – PI / 2.

Arduino has only 1 ADC, so there is always constant delay time equals to one sampling period ( T = 1/40 kHz = 25 usec), which also should be subtracted ( or added, depends how you associate input 1 and 2 – left / right side mic.)

Filtering.

 To fight reverberation and noise, I choose a Low Pass Filter, which I’d call here as a “Rolling Filter”. My research with regular LPF, shows that this class of filters is completely NOT appropriate for such type of data, due their high susceptibility to “spikes”, or sudden jump in magnitude level. For example, when system getting steady reading from 2-3 test frequencies with low values, let say -10, simple averaging ( should be -10 ) results will be corrupted with one accidental spike (magnitude +2000) during next 60 – 100 consecutive frames !!!  The Median Filter doing well eliminating sudden spikes, the same time is very hungry to CPU cycles, as it’s using “sort” algorithm each time new sample was arrived to the data pull. Having 64 frequencies, and setting filter kernel to 5 – 8 samples, arduino would be buried doing sorting at almost 40 ksps.  Even processing each frequencies data not individually,  and sorting only one 64 elements array still very time consuming job.  After thinking a while, I came up to conclusion, that “Rolling Filter” has almost the same efficiency as Median, but instead of “sorting” requires only 1 additive operation! On long run, the output value will “roll” and “stick” to the middle of the pull. ( Try to model it in LibreOffice. )  Adjusting “step” of the “Rolling Filter”, you can easy manipulate responsiveness,  which is almost impossible with Median Filters. (Things TO DO: Adaptive Filtering, real time adjustment depends on input data “quality”).

 To be continue…. Video will follows !

( Predicting your question, how “good” is localization?  Its about same, as Laser TRF (tracking range finder) has, look at other blogs for now to get impression.  In other words: ASTONISHINGLY GOOD,   a few (1 – 5) angular degree in closed environment.

Link to Arduino (Leonardo) sketch:  Localizator-beta-9.


Visual Navigator. Making it MOBILE !

Obstacle avoiding vehicle, continue in “3D Laser Range Finder” series ( project 1, project 2). The basic idea is the same, measuring distance using red laser pointers, CCD analog camera and Arduino UNO.  Modification was made in geometry.  Two lasers were set for “far field” obstacle detection, few meters in front of vehicle on left or right side. Primary mission is to trigger left / right turn before a car get too close to the “continuous” but not necessarily “high”  object, for example, sidewalk stone. Of course, this distance depends on the vehicle speed, and “alert” should be dispatched in right time “window”, or there would be no space left to making a turn ( proportional speed adaptation is not implemented yet). Low height of such road infrastructure is making useless ultrasound based range finder.

 

Two additional lasers were set in “cross” configuration, in order to detect any object that comes dangerously close to the front of vehicle. “Near field” obstacle detection or “head on collision” avoidance. Theirs two beams form reflective “trip-wires” and able to detect as narrow object as leg of a chair or desk, open door frame, anything that at least 1 mm wide.  One laser, pointed to the left, is also works as sidewalk / wall follow navigation system, keeping this distance constant.

Now couple words on “autopilot” algorithm. Three main feature of the project:

  1. wall / sidewalk following;
  2. “far field” obstacle avoidance;
  3. “near field” head on collision avoidance.

were classified in 3 priority levels: 1 – warning, 2 – major, 3 – critical.

0 – clear level, corresponds to normal  R/C radio control, or by  ”man / operator”  navigation via  remote R/C module. Operator is also has “authority” to decline warning class navigator status. But it’s not the case when navigator’s “autopilot” subroutine performs class 2 or 3 maneuver, with status “major” and “critical”. When vehicle performs maneuver 2, “left / right” command from R/C remote module are ignored, the same with “forward / backward” command in status “3 – critical”, making algorithm completely “fool – proof”.

More video will be posted, Link to Arduino UNO sketch: Visual_Navigator.

 5 August 2012.

I’d like to publish more pictures from “inside”, which show interface between arduino and R/C receiver module in the car. Well, not quite arduino, I build a “clone” using pre-programmed AtMega328. As you can see, the receiver was left almost intact, what I did, is just identified two on-board H-bridges which supply power to steering control motor and main vehicles motor-driver. Than, remove 4 resistors in series with controls lines, and routed 8 wires to the arduino ( 4 inputs from R/C receiver and 4 outputs to H-bridges ). Here you are, now arduino could intercept any command coming from R/C transmitter, and based on data from the sensors, make a decision if it makes sense to follow them. Also, “autopilot” function could “directly” address two motors in order to execute “obstacle avoiding” maneuver not asking anyone’s permission!.  What more, arduino control a power delivered to motors via software PWM,  making 7! different speed level available like in real vehicle. Unfortunately, the model I “hack” doesn’t use proportional steering control, but still PWM power management helpful to save a battery energy, limiting unnecessary current delivered to motor.

 


Audio VU meter (AC microVoltmeter) with Extra wide Dynamic Range 69 dB.

O’K, after having some fun with stereo version of the VU meter I described in my previous blog-post, now it’s time to do a serious stuff. Studio grade VU meter !!! 24 steps, equally spaced every 3 dB, covering Extra wide Dynamic Range from -63  up to  +6 dB.  Single (mono) channel this time, no messing around, absolute precision at the stake. Plus, it keeps absolutely Top-Flat linear frequency response from 40 Hz up to 20 kHz(*).

 

 

I’m not going into details of RGB LEDs Display, which has no modification since “Tears of Rainbow” project, only plates installed in one line, form a single GIGANTIC bar-graph. There are some minor changes in mixing colors data tables, but they intuitively understandable.  The most important feature in this project is autoscaling. As you, probably know, Arduino has 10 bits ADC. Only it can’t process negative half-wave, and for this reason it has only 9 bits available for AC measurements.  According to DSP theory, maximum dynamic range is:

DR = 1.77 + 6.02 x B = 1.77 + 6.02 x 9 = 55.95 dB.

 As input audio waveform represents anything but perfect peak-to-peak 5V sine-wave, real dynamic range would be lower. How much? In first, there is a hardware limits.  OPA (NE5532), which is:

  • very low noise !!!
  •  high output-drive capability;
  •  high unity-gain and maximum-output-swing bandwidths;
  •  low distortion;
  •  high slew rate;
  •  input-protection diodes, and output short-circuit protection

 but, unfortunately,  isn’t rail-to-rail type. Test results show, that compression  become noticeable (~1 dB) when not scaled magnitude approaches level about 50 dB. That is in good agreement with observed on oscilloscope not distorted deviation peak-to-peak 2.5 V. Or only half of full range of 5V. And as theory says, half is one bit less, and real DR = 1.77 + 6.02 x 8 = 49.93  (~50 dB). In second, audio data is processed on “block” structure basis. It means, having average of the block 50 dB, doesn’t mean that there was no spikes in the sampling pull, that obviously would be clipped and introduce error in the measurements results.  This phenomenon is defined as Crest Factor. Different sources estimate crest factor of musical content between 10 – 20 dB.  So, taking direct approach, Arduino with OPA mentioned above as front-end could accurately cover only:              50 – 20 = 30 dB.  To get wider dynamic range, I have to scale input amplifier gain, and this is exactly what I did, building amplifier in two stages and selecting one cascade (by-passing second one) or two cascades using internal ADC multiplexer. As there is no switching IC in analog signal path involved, gain is defined with high stability, could be one time precisely measured – calibrated via coefficient stored in EEPROM (nice feature to add).

On the right side there are electrical drawings of “slightly” modified kit,  where stereo amplifier was converted into 2 stage mono version. First stage, with gain about  G1 = 1 + 10 k / 1 k = 11  is necessary to “bump-up” line-level signal, to create DC bias required for correct operation of the ADC, and also served as buffer to lower signal source impedance, as it seen by ADC input.  I set a gain of the second stage amplifier at 40 dB:  20 x Log_10 ( G2 ),     where    G2 = 1 + 100 k / 1 k = 101.

IMHO, setting gain limit for only 30 db per stage as it follows from paragraph above, is overkill, and would be justified for “real-time” radio broadcasting or audio processing for storage media, when high fidelity of audio program must be preserved. For visual display “clipping” of bursts in signal is not noticeable at all due high refresh rate of display, 78 Hz. Human just can’t see, if LED lights-up with such speed.  For steady AC amplitude measurements (micro Voltmeter mode) this is not a problem at all, and headroom as small as 3 dB would be sufficient, leaving wide 47 dB per stage.

 Software

  There are two thresholds are defined in program, where switching between one or two stage amplification is happening:

      if ( magn_new <=  44 ) sensitv = 1;

      if ( magn_new >= 47 ) sensitv = 0;

  44 and 47, with hysteresis 3 dB. First line defines switching to high sensitive mode (overall gain 1100), and second line, does exactly opposite. Look at the chart, hope it would save me a million words -);

 Couple words on using this device as precise AC micro-voltmeter. Having 1100 overall amplification as add-up to already quite sensitive Arduino ADC, driving overall sensitivity to enormously  5 / ( 1024 x 1100 ) = 4.439 uV Special care should be taken on grounding, shielding of amplifier PCB, probably, EMI suppressor ferrite chokes wouldn’t be an excess in power line and signal path.   In my project, w/o any modification to original kit’s board (except couple jumper wires to cascade two stage amplifier) of course, I was not expecting to get to such high sensitivity level. Moreover, in project arduino is driving LED display, “ADC noise reduction mode” is off, plus ADC is working on double speed – preselector set to 250 kHz!!!  And this is why constant 14 was subtracted in software from magn_new, just before it goes for BarGraph “mapping” procedure:

      magn_new  -= 14;

Basically 14 is a noise flour of my analog front-end.  Approximately 51 micro volts AC is turning on first LED bar. Look at the table, which reflect my current hardware set-up.

* Other things to keep in mind, there is a “gap” 78 Hz wide in frequency range at 10 kHz,  It introduces a small error, about  78 / 20.000 = 0.39% in white noise measurements result. For musical content, which has really low power density level at 10 kHz, magnitude of error would be much lower, probably, less than 0.05 %.

 Running FFT in code creates great opportunity to reject any interference in the audio band. For example, if there is a noticeable hum from electrical grid lines in the content, issue easily could be fixed NOT including bin[1] in final sum of magnitude calculation. Though to make it works more efficient, some adjustment in sampling period would be necessary, setting bin[1] frequency precisely at 50/60 Hz.

 One more advantage of having FFT based  filtering     (primary mission is HPF, look in stereo VU meter, how long kernel of the FIR filter has to be otherwise), is great opportunity to create “weighting” A, B, C or D curve for audio noise measurements. (:TO DO).

 Link to Download Arduino sketch:  Audio_VU_Meter_Mono_69dB


**********Stereo Audio VU meter on Arduino**********

This blog is a sequel of “Tears of Rainbow”.  Using the same hardware set-up of Gigantic RGB LED display, I decided to re-work software a little bit, in order to display the true RMS amplitude of musical content. Video clip on youtube:                       VU_Meter   640×480                                      VU_Meter_HD

Objective:

  • Stereo input, process both channel;
  • Full audio band, 40 Hz – 20 kHz;
  • Fast update rate of visual output.
  • Precision Full-Wave  measurements. 

To process stereo input, this time arduino is switching ADC multiplexer every time when it finish sampling input data array (size=128). Two channels “interleaved” with frame rate 78 Hz, so during each frame only one channel sampled / processed, and update rate per channel is equals to 78 / 2 = 39 Hz, which is more than enough for most audio applications.

 I’m using FFT Radix-4  to extract RMS magnitude of audio waveform, and this is why:

1.  Sampling rate in this application is 10 kHz. How I achieved  20 kHz stated in objective section, doing sampling only 10 ksps?  >>>Aliasing!<<<   What is considered to be nightmare when we need spectral information from FFT output, aliasing in this project is really helpful, reflecting all spectral components around  axis – 10 kHz back “to the field”. As all bins going to be sum-up there is no issue, only benefits. Due aliasing, I’m able to use low sampling rate, and reduce CPU workload down to 52%.

2.  In order to get accurate magnitude calculation of RMS,  which is defined as square root of the sum of squares divided by number of samples per specified period of time:    V(rms) = √ ( ∑ Vi ^2 ) / N) DC offset  must be subtracted from the input raw data of each sample    Vi = Vac + Vdc   (if you remember, AtMega328 ADC needs DC offset to read AC negative half-wave).  The problem here, DC offset value is never known with high accuracy due bunch of reason, like voltage stability of PSU,  thermal effects, resistors tolerance (+/- 1 or 5 %), ADC internal non-linearity etc. Cure for this, which works quite well for monitoring electrical grid power, high pass filter (HPF). Only instead of single 50/60 Hz frequency of power line,  I have a wide frequency range, starting from 20 Hz and ending at 20 kHz. When I feed specification of the HPF:

  • Sample Rate (Hz) ? [0 to 20000]                     ? 10000
  • Desired stop-band attenuation (dB) [10 to 200] ? 40
  • Stop-band edge frequency Fa [0 to 5000]         ? 0
  • Pass-band edge frequency Fp [0 to 5000]        ? 40

to  Parks-McClellan FIR filter design algorithm (one of the most popular, and probably, the best) it provides the result:

  • …filter length: 551 …beta: 3.395321

551 coefficient to be multiplied and sum up (MAC-ed) every 100 usec! No way. I’m not sure, if it could be done on 32-bits 100 MHz platform with build-in MAC hardware, but there is no way for 8-bit 16 MHz Arduino.

IIR filter wouldn’t make much difference here. It has lower quantity of multiplications, but more sensitive for truncation and rounding error, so I’d have to use longer (32-bits?) variables, which is not desirable on 8-bit microprocessor at all.

And here comes FFT Radix-4, which easily fulfill this extra-tough requirements in the most efficient and elegant way. All I have to do, is just NOT include bin[0] in final sum, and all DONE!. TOP-FLAT  linear frequency response  40 Hz – 20 kHz  ( -3 dB ), with complete suppression of DC, and low frequency rumble below 20 Hz attenuation.  Linearity is better than +-1 dB between 80 – 9960 Hz.

Last things, audio front-end. As VU meter was designed in stereo version, I’ve build another “line-in”  pre-amplifier based on this kit: Super Ear Amplifier Kit

Link to Download a sketch:  Stereo_VU_Meter.

 

Modified Stereo VU meter, Logarithmic scale, 8 bars per channel, spacing 6 dB.

Dynamic range: 8 x 6 = 48 dB.  Stereo_VU_Meter(Log10).
 Next blog:   Extending dynamic range to 72 dB! 

Tears of Rainbow.

Video clips on youtube, arduino is running simple demo application.

Tears of Rainbow                             BarGraph HD movie                                    " href="//www.youtube.com/watch?v=30ELYwyy4JQ&feature=youtu.be]" target="_blank">BarGraph movie.

 

It’s time to release new updates for my first (ever) project with Arduino, “Color Light Music”.  From artistic perspective, VU BarGraph style (IMHO) is the best one for spectral dynamic representation, and not much could be improved on this side. But this time, it cross my mind an another idea “Tears of Rainbow”. This blog about how successively (or awfully) the idea was brought to life. And of course, VU visual effects still there, updated with nice peak indicators, color adjustment flexibility (this time triple color LEDs), and PWM-ed brightness settings luxury.  So, this is design requirements, I was following:

  •   make it as big as possible, GIGANTIC size !;
  •   Lego style, or many blocks / modules, which could be re-arranged in different pattern;
  •   extend-able,  easy to add up more blocks later on;
  •   low price on hardware, no special display driver IC.
To simplify assembly work, I decided to buy RGB Led Strip. I had known, from my first project, that design would be composed with straight lines, and the longer lines means the more LED’s ( and consequently, soldering work). For comparison, one line on this display consist of 6 RGB leds, or 24 soldering connections. Using RGB strip, I reduce a workload 24 to 4, or 6 times. I envy to people,  who have a patience to build  8x8x8 RGB led cube (or even 10^3 !).   Addressable RGB strip would make life even easier,   but  I couldn’t find local re-seller,  and was not going to wait shipment / customs. It’s summer time!
 In order to easy reconfigure a style, for example, from  3 BarGraphs, needed in Color Music exposition, to just  1 GIGANTIC VU meter (*),  RGB led strip is chopped-up and attached to 3 rectangular shape plates. I find out, that for some reason strip isn’t “sticky” enough, and to keep its perfectly align on a plate, I used a tire-ups at both ends. Luckily, it was quite easy to punch a holes in the plates for tire-ups just using kitchen knife.
 It wouldn’t be so, if I use a glass as a back-plate
(I had such idea initially). Something to think, if you plan to work with a strip in your design. The same also true for wiring (32 wires per plate). Tin “cookie” plates just was made to be part of this project!  And I even did not mention the heat dissipation,  1/3 of 5 meters strip consume around 12 W of power,  it’s almost like my soldering iron!
 One more things before I forget, I installed 1 cm paper pads to insulate contacts from the metal plate in the middle and on one side. Heat shrink tube takes care of the other end.
 LED’s use 12V as power source, and as I need a lot of  PWM channels to control their brightness , here comes 74HC595 buffered by ULN2803 at the outputs. Nothing special, 9 shift registers daisy chained to produce 72 PWM outputs. Two IC in a pair installed in reverse on a prototype board, to minimize a number of interconnections. As you can see from the  picture, there is only 1! yellow jumper brought from pin 15 of the shift register to pin 8 of the Darlington array. Why they don’t make a shift register in DIP-16 package? There wouldn’t be any jumpers at all!  Other alternative is using TPIC6B595.
        * For clarity, schematic diagram shows only two pairs of chip, and half length of the strip lines.

Now software part.

There are on-line libraries available, to drive 74HC595 by arduino. Only some of them not using hardware “build-in”  SPI interface , and really slow in communication with peripheral IC’s (don’t forget, that LED display only second part of the project, the first one, FFT, is very time consuming). The others libraries, nicely written and perfectly optimized for speed, have too much functionality, that I don’t need in my project, plus they are memory demanding. On the other hand, I need low resolution animation function – sliding down colorful tears, that I have to create on my own.  Now I ‘d like to represent a code, very fast SPI subroutines, completely written in C !  Function shift out 9 bytes ( for 9 shift registers in this project ) approximately in less than 36 usec, or 0.5 usec per one PWM channel. One bit-set in the unrolled loop is about 4.5 cycles.

static uint8_t brightn;

brightn++;
 if(( brightn % QUANTUMS ) == 0)
{
bitClear(PORTB,LATCH_PB);
SPDR = 0;

uint8_t * srP = &brightns[PIN_NBRS];
uint8_t cmp_level = brightn;

for (int8_t iSR = 0, curBt = 0; iSR < IC_COUNT ; iSR++, curBt = 0){

if ((* –srP) > cmp_level)
curBt |= 0b00000001;
if ((* –srP) > cmp_level)
curBt |= 0b00000010;
if ((* –srP) > cmp_level)
curBt |= 0b00000100;
if ((* –srP) > cmp_level)
curBt |= 0b00001000;
if ((* –srP) > cmp_level)
curBt |= 0b00010000;
if ((* –srP) > cmp_level)
curBt |= 0b00100000;
if ((* –srP) > cmp_level)
curBt |= 0b01000000;
if ((* –srP) > cmp_level)
curBt |= 0b10000000;

loop_until_bit_is_set(SPSR, SPIF);
SPDR = curBt; // Start the transmission
}
loop_until_bit_is_set(SPSR, SPIF);
bitSet(PORTB,LATCH_PB);
}

FFT part of the code completely “copy / pasted” form my “Radix-4″ blog. Here an advise, if you wish to explore the code, look there for “pure” form of function. What is new in this publication, is magnitude calculation subroutine, without slow SQRT.

Bar Graphs “set position” sub-function, or mapping height of lighted area of a plate to integral sum of the bins, brought into this project with mild modification from first project.

Continue moving from LED’s display to audio input, I should say couple words on a sampling. There are two functions in the project, that have to be triggered periodically with a timer, “display refresh” posted above and “take ADC sample”. It looks logically, instead of having two timers and have a lot of troubles with collision / racing between them, to scale both function to the same time frame, and execute them at once. “Display refresh” rate equals to minimum rate just to avoid flickering (60/70 Hz) multiplied by the numbers of brightness level. For example, setting brightness step number to 256 ( which provides excellent 256 x 256 x 256 = 16 M colors ) would require periodicity 60 x 256 = 15360 Hz.  See, where I’m driving at? Exactly, 15 kHz is nice frequency to sample audio input!. Well, it’s not 44.1 kHz as default settings Hi-Fi audio standard would recommend, but I ‘m not using all sound data in this project, as I only interested in lower 2 kHz part of the spectrum. And BTW, it’s almost 4x times higher than bear minimum prescribed by sampling theorem (Whittaker–Shannon–Kotelnikov).  I’ve made my choice at 15.625 kHz, to simplify math of binary compare to 64. ( 1/64 usec = 15.6 kHz) If there is no big difference, why not pick up “lucky” binary number, and help a Timer to do his job?

Initially, I thought that I would just re-use sampling sub-function  from “Pitch Shifting” project, slightly adjusting it from 8.0  to 15.6 kHz. I was surprised to discover, that TIMER2 and SPI don’t want to work together! Have I missed something in a data sheet? Could be, sometimes it’s so hard to comprehend, that I’d be still experimenting with “Blinking Led”, if not help from this (must to have) masterpiece:

AVR Microcontroller and Embedded Systems: Using Assembly and C (Pearson Custom Electronics Technology)

O’K, there is TIMER1 available. As project already have been heavily “over-loaded” on software side, I decided to take TimerOne library and not bother myself this time with a bunch of registers, interruptions, masks, etc, leaving this out of the scope, as not related to subject.

FFT size =128 provides extra fine resolution for Color Music performance. BTW, may be it not obvious, but bigger size of FFT has LOWER CPU workload per sample. And last, after everything was melted in one  (BIG) sketch nothing happened. No, my arduino can’t catch up at 15.6 kHz. Shifting 9 bytes via SPI, as I mention earlier, takes 36 microseconds. It’s leaving 64 – 36 = 28 microseconds per sample for everything else, or 28 x 128 = 3 584 per frame.  Radix-4 (size = 128) takes 4.2 milliseconds, as I posted here.  Alright, hell with it, who need 16 M colors  on 8 x 3 led display, by the way?  So, I bring quantity of brightness steps down to:             256 / 4 = 64, which is more than enough -> 262 144 color combinations!  QUANTUMS definition sets coefficient 4 in SPI sub-function.  The same time frame rate equals to  122 Hz, which is 2x times higher, than 60 Hz I started my calculation with.

Default color map, or bin’s assignment to a specific plate, is shown above. This time I implemented a command in CLI to make adjustment in this map on the fly, according to music style, equalizing all three bars more or less proportionally. Automatic Gain Control loop implemented in first project, doesn’t work so great with bigger display size ( first project uses 4  lines per color ). Plus, AGC bringing noise in the visual performance in  pauses between  two songs and in quite fragments of the music.  Starting bin position for each RGB plate could not be changed using CLI  ( you still can do this modifying the scetch), but quantity of bins accumulated per plate could be adjusted simply sending “dr” for red, “dg” – for green, and “db” - for blue, where  d is a digit 0..9.  Bands could over-lap, which is not desirable, in this case red is limited too 0..3, green 0..6, and blue 0..9.

More on the audio input hardware and sampling software subroutine, I post in separate blog, as this part follows w/o much modification thorough a few previous blogs, and doesn’t need to be re-stated here.

*Note: in software G.VU is not implemented yet.

Link to download a sketch:  Tears_of_Rainbow.


Voice Pitch Shifting – Scrambler.

After I’ve made astonishing breakthrough in speed of the FFT algorithm based on RADIX-4 version, I decided to create a new project, which would take a full advantage of Radix-4 “rocket science” performance. Reviewing my “Voice Recognition” blog it looks logically to create just exactly opposite application – Voice Scrambler. To make it happened, I need FFT subroutine to be completed twice, in forward and reverse ( iFFT ) direction. Than simply manipulating individual frequencies (bins) position in the array, I could scramble a voice in well known old fashioned manner, inverting the spectrum of the human voice, making it sounds completely non intelligible (alien’s voice).  For Pitch Shifting, bins have to be “progressively” spaced between each other, driving timbre up on the scale. It is possible to lower a Pitch as well, shrinking and over-lap bins, but I made only up shifting part for now.

Making preliminary calculation, I get:   10.1 milliseconds x 2 / 256 samples = 78.9 microseconds / sample.  Or turning up side down,  12.67 kHz sampling frequency. It’s rough estimation, but regular “public phone quality” – 8 kHz sampling rate looks quite achievable.  Next, as always, if CPU is o’k, let check into memory management. Unfortunately, arduino Uno (2 kB) wouldn’t allow to use fft-256, because:  input buffer (256) + output (256) +  fft processing ( real + imaginary) (512), plus multiply by 2 (integer size, 1024 x 2 = 2048) would occupy all available 2 kB. So, I have to decrease fft size on one level down, to fft-128, or even two level down – make fft-64, from original fft-256 (presumably good quality, comparable with MP3 codec).  After completing a couple sketchy tests, fft-64 shows pure quality of speech and was rejected. Can’t say I did thoroughful  research on this matter, may be fft-64 could still be good in other circumstances or with musical material content instead of speech.

O’k, fft-128 – compromise version was selected by God. But my code, published in “RADIX-4″ blog hasn’t included RADIX-8 section, which is required when size of the array isn’t a power of 4. Nothing to do, the only way to bring  Scrambler project to life, is  to write a missing section of the RADIX-8 code… So I did. Timing measurements show, that Radix-4 with new patch is running even faster, than extrapolated from fft-256 down to fft-128 speed of RADIX-4 without it, measurements result: 4.2 milliseconds (compare to extrapolated 4.6 milliseconds). By the way, as I mention in other post about Split-Radix, if it would be my next adventure. It was, I have re-written from “Matters Computational” http://www.jjj.de/  Split-Radix C/C++ code in my tools-box now, which is to my surprise shows practically no difference in speed with Radix-4 algorithm, same  10.1 (fft-256) milliseconds execution time, and loosing ! competition giving 4.6 milliseconds (fft-128).

 
Other things to explain, as Pitch Shifting considered to be most complicated task even for monstrous DSP processors, I’ve skipped two important procedures: windowing on input samples and add-overlap on the output flow, just because no processor’s time left. Luckily, due speech’s spectrum concentration around most noticeable “middle” area, cutting off “windowing” is only slightly increases noise level. On the other hand, missing add-overlap procedure has greater negative impact on the voice quality,  ”robotize” it via parasitic amplitude modulation. When frequency bins re-mapped in the pull, running inverse iFFT doesn’t produce continuity  ”match” with previous block as it should, because input has no a continuity disruption between samples blocks (128 samples, ~16 milliseconds of voice frame).  Well, if I do everything right, there wouldn’t be any fun with arduino, would it be?  Btw, there are two outputs buffer in the software, especially to track this issue. I was thinking to implement some kind of distortion estimator based on this information. Meanwhile, one of them could be removed to save some memory for other part of the program.

Hardware.

Arduino has 10 bits ADC, this is why I decided to build a DAC for audio output based on PWM TIMER0 feature, 5 to 5 bits split equally between pin 5 and 6. One PWM pin could only play 7 bits sound (don’t forget sign bit), which is too low. Weighted 32R-R ladder potentially allows to increase PWM frequency up to 250 kHz or so, and simplify requirements for output filter design. Nevertheless, I left “default” 31 kHz settings for TIMER0, just in case I need more research with 16 bits output later on, All it would takes to switch on “full” 16 bits, just add one 256 k resistor and divide integer in  high and low byte.

Please, be advised, that included scrambler function was not fully tested, as I don’t have second arduino to do a decoding in real time-);  (Probably, I can record a scrambled sound and decode it in the second passage trough the system – things TO DO.)  The same time Pitch Shifting was successfully tested, as you can see in posted video clip. YouTube Video

Summary:

  • Sampling rate 8 kHz. (easily to adjust by TIMER2 variable.)
  • Output rate 8 kHz. (same as sampling, two processes are synchronous.)
  • Delay 16 milliseconds. (1 block, or two blocks 32 milliseconds.)
  • Input/Output resolution: 10 bits.

There are build-in command line interface:

  • if (incomingByte == ‘m’) { // FREE MEMORY BYTES
  • if (incomingByte == ‘x’) { // PRINT OUT INCOMING SAMPLING BUFFER
  • if (incomingByte == ‘s’) { // SWITCHES – SCRAMBLING ON / OFF
  • if (incomingByte == ‘y’) { // PRINT OUT OUTGOING  BUFFER
  • if (incomingByte == ‘f’) { // DATA AFTER FFT, FREQUENCIES BINS
  • if (incomingByte == ‘p’) { // DATA AFTER PITCH SHIFTING – SCRAMBLING
  • if ((incomingByte >= ’0′) && (incomingByte <= ’9′)) { // DIGITS  ”1″ to “9″ – REGULATE MAGNITUDE OF SHIFTING, “0″ – SPECTRUM INVERSION.

Link to download an Arduino sketch: Pitch Shifting – Scrambler 


RADIX-4 FFT (integer math).

Tweaking the FFT code, that I’ve published earlier in my series of blogs, I hit a “stone wall”. There are nothing could be improved in the “musical note recognition” version of the code, in order to make it faster. At least, nothing w/o completely switching to assembler language, what I’m trying to avoid for now.  I’m sure, it’s the fastest C algorithm. Looking around it didn’t take long to find out that there is other option: change RADIX-2 algorithm for RADIX with higher order, 4, 8, or split-radix approach. Putting split-radix aside, (would it be my next adventure?), RADIX-4 looks promising, with theoretically 1/4 reduction in number of multiplications (what I believe is an “Achilles heel”).

Googling for awhile, I couldn’t find fixed point version in plain C or C++ language. There is TI’s “Autoscaling Radix-4 FFT for MS320C6000TM” application report, which I find useful , but the problem is it’s ”bind” with TI microprocessors hardware multiplier, and any attempt to re-write code would, probably, make it’s performance even worse than RADIX-2. Having “tweaking” experience with fix_fft source code from:  http://www.jjj.de/             I decide to follow same path, as I did before, adapting fix_fft for arduino: take their floating point source, disassemble it to the pieces, and than combine all parts back as fixed point or integer math components.    And you know what ? Thanks God, I successed!!!

I decided not all parts to re-assemble back again, this is why fft_size has to be power of 4 ( 16, 64, 256, 1024 etc.). Next, the software is “adjustable” for different level of the optimization. Trade is always the same, accuracy against speed. I’d highlight 3 level at this point:

1. No optimization, all math operation 15-bits.   The slowest version. Not tested at all.

2. Compromise version.  Switches: 12-bits Sine table, regular multiplication (long) right shifted >>12, Half-Scaling in the sum_dif_I (RSL) >>1. Recorded measurements result:  24 milliseconds with N = 256 fft_size.

3. Maximum optimization. Switches: 8-bits Sine table, macro assembler multiplication short cut, no scaling in the core. Timing 10.1 millisecond!!!

Fastest. Best of the Best Ever written FFT code for 8-bit microprocessor.   Enjoy the meal:   https://docs.google.com/open?id=0Bw4tXXvyWtFVMldRT3NFMGNTZVN0Y0d4eVRsenVZdw

Here is slightly modified copy, where I moved sine table from RAM to FLASH memory using progmem utility. For someone, who was curious to find the answer: how much progmem slower compare to access data in the RAM, there is an answer. 10.16 milliseconds become 10.28, or 120 usec slower. Divide by 84 x 6 = 504 number of readings, each progmem costs 0.24 useconds. Its about 4 cycles CPU.

https://docs.google.com/open?id=0Bw4tXXvyWtFVQjZpZkw1c3VUZXlmaF9sOEJwMmpEUQ

Screenshot from the running application, signal generator running on the computer, feeding audio wave to OPA and than analog input 0. Look for hardware setup configuration on the “color organ” blog-post.

LInk to first version based on RADIX-2 FFT:     LINK

BTW, there is one more important thing, I missed to emphasize in my short introductory paragraph, code offers FLEXIBILITY over SNR ratio. Basic FFT algorithm has an intrinsic “build-in” GAIN: G(in) = FFT_SIZE / 2 . (in) stands for intrinsic. That is perfect value for fft_size = 64 ( Gain = 64 / 2 = 32) and arduino (Atmel AtMega328)  10-bit ADC ( max value = 1023 ). FFT output would be 32 x 1023 = 32736, exactly 15 bit + sign. In other words, scaling in the algorithm core doesn’t required at all! That alone improve speed and lower rounding noise error significantly. The same time G(in)  grows too high with FFT_SIZE = 256, when G = 256 / 2 = 128 and output of the FFT would overflow size of 16-bit integer math. But again, scaling don’t have to be 100%, as long as there is a way to keep it in balance with ADC data. In this particular case, with 10-bit ADC, we can keep gain just below 32, it’s not necessary to make it exactly “1″.  For 12-bit ADC upper G limit would be 8, still not “1″. To manipulate the gain, division by 2 (>> 1) in the “sum_dif_I” could be set, to prevent overflow with fft_size > 64. Right shift “gain limiter” creates a square root adjustment, according to new formula: G(rsl) = SQRT (FFT_SIZE) / 4 . (rsl) stands for right-shift-limiter.

  1.  G = 1 for fft_size = 16,
  2.  G = 2 for fft_size = 64,
  3.  G = 4 for fft_size = 256,
  4.  G = 8 for fft_size = 1024.

Summing up, for using RADIX-4 with arduino ADC and FFT_SIZE <= 64, keep division by 2 (>> 1) in the “sum_dif_I” commented out. In any other circumstances, >10 bits external ADC, >64 fft_size, uncomment it.

To be continue…..


RADIX-4 FFT (integer math).

Updates on 30 Sept. 2014:

Everything below is correct, and may worth to read. But new  code based on Split Radix Real is published. Faster, lower memory demands, both version for UNO and DUE available as libraries.  New algorithm makes it possible to run FFT_SIZE =  512 on UNO board in less than 9.6 milliseconds.

/**************************************************************************************************************************************

Tweaking the FFT code, that I’ve published earlier in my series of blogs, I hit a “stone wall”. There are nothing could be improved in the “musical note recognition” version of the code, in order to make it faster. At least, nothing w/o completely switching to assembler language, what I’m trying to avoid for now.  I’m sure, it’s the fastest C algorithm. Looking around it didn’t take long to find out that there is other option: change RADIX-2 algorithm for RADIX with higher order, 4, 8, or split-radix approach. Putting split-radix aside, (would it be my next adventure?), RADIX-4 looks promising, with theoretically 1/4 reduction in number of multiplications (what I believe is an “Achilles heel”).

Googling for awhile, I couldn’t find fixed point version in plain C or C++ language. There is TI’s “Autoscaling Radix-4 FFT for MS320C6000TM” application report, which I find useful , but the problem is it’s “bind” with TI microprocessors hardware multiplier, and any attempt to re-write code would, probably, make it’s performance even worse than RADIX-2. Having “tweaking” experience with fix_fft source code from:  http://www.jjj.de/             I decide to follow same path, as I did before, adapting fix_fft for arduino: take their floating point source, disassemble it to the pieces, and than combine all parts back as fixed point or integer math components.    And you know what ? Thanks God, I successed!!!

I decided not all parts to re-assemble back again, this is why fft_size has to be power of 4 ( 16, 64, 256, 1024 etc.). Next, the software is “adjustable” for different level of the optimization. Trade is always the same, accuracy against speed. I’d highlight 3 level at this point:

1. No optimization, all math operation 15-bits.   The slowest version. Not tested at all.

2. Compromise version.  Switches: 12-bits Sine table, regular multiplication (long) right shifted >>12, Half-Scaling in the sum_dif_I (RSL) >>1. Recorded measurements result:  24 milliseconds with N = 256 fft_size.

3. Maximum optimization. Switches: 8-bits Sine table, macro assembler multiplication short cut, no scaling in the core. Timing 10.1 millisecond!!!

Fastest. Best of the Best Ever written FFT code for 8-bit microprocessor.   Enjoy the meal:   https://docs.google.com/open?id=0Bw4tXXvyWtFVMldRT3NFMGNTZVN0Y0d4eVRsenVZdw

Here is slightly modified copy, where I moved sine table from RAM to FLASH memory using progmem utility. For someone, who was curious to find the answer: how much progmem slower compare to access data in the RAM, there is an answer. 10.16 milliseconds become 10.28, or 120 usec slower. Divide by 84 x 6 = 504 number of readings, each progmem costs 0.24 useconds. Its about 4 cycles CPU.

https://docs.google.com/open?id=0Bw4tXXvyWtFVQjZpZkw1c3VUZXlmaF9sOEJwMmpEUQ

Screenshot from the running application, signal generator running on the computer, feeding audio wave to OPA and than analog input 0. Look for hardware setup configuration on the “color organ” blog-post.

BTW, there is one more important thing, I missed to emphasize in my short introductory paragraph, code offers FLEXIBILITY over SNR ratio. Basic FFT algorithm has an intrinsic “build-in” GAIN: G(in) = FFT_SIZE / 2 . (in) stands for intrinsic. That is perfect value for fft_size = 64 ( Gain = 64 / 2 = 32) and arduino (Atmel AtMega328)  10-bit ADC ( max value = 1023 ). FFT output would be 32 x 1023 = 32736, exactly 15 bit + sign. In other words, scaling in the algorithm core doesn’t required at all! That alone improve speed and lower rounding noise error significantly. The same time G(in)  grows too high with FFT_SIZE = 256, when G = 256 / 2 = 128 and output of the FFT would overflow size of 16-bit integer math. But again, scaling don’t have to be 100%, as long as there is a way to keep it in balance with ADC data. In this particular case, with 10-bit ADC, we can keep gain just below 32, it’s not necessary to make it exactly “1″.  For 12-bit ADC upper G limit would be 8, still not “1″. To manipulate the gain, division by 2 (>> 1) in the “sum_dif_I” could be set, to prevent overflow with fft_size > 64. Right shift “gain limiter” creates a square root adjustment, according to new formula: G(rsl) = SQRT (FFT_SIZE) / 4 . (rsl) stands for right-shift-limiter.

  1.  G = 1 for fft_size = 16,
  2.  G = 2 for fft_size = 64,
  3.  G = 4 for fft_size = 256,
  4.  G = 8 for fft_size = 1024.

Summing up, for using RADIX-4 with arduino ADC and FFT_SIZE <= 64, keep division by 2 (>> 1) in the “sum_dif_I” commented out. In any other circumstances, >10 bits external ADC, >64 fft_size, uncomment it.

To be continue…..