Jose 4992b75d57 feat : Improved visualization by adding logarithmic scaling to the x-axis labels and updating the y-axis scale based on the maximum value of the frequency spectrum.
- Updated the `y-axis scale` based on the maximum value of the frequency spectrum, added logarithmic scaling to the x-axis labels, and improved interpolation logic for better display.
2025-04-25 21:29:16 +02:00

ESP32 Piano Note Detection System

A real-time piano note detection system implemented on ESP32 using I2S microphone input. This system can detect musical notes from C2 to C6 with adjustable sensitivity and visualization options.

Features

  • Real-time audio processing using I2S microphone
  • FFT-based frequency analysis
  • Note detection from C2 (65.41 Hz) to C6 (1046.50 Hz)
  • Dynamic threshold calibration
  • Multiple note detection (up to 7 simultaneous notes)
  • Harmonic filtering
  • Real-time spectrum visualization
  • Note timing and duration tracking
  • Interactive Serial commands for system tuning

Hardware Requirements

  • ESP32 development board
  • I2S MEMS microphone (e.g., INMP441, SPH0645)
  • USB connection for Serial monitoring

Pin Configuration

The system uses the following I2S pins by default (configurable in Config.h):

  • SCK (Serial Clock): GPIO 8
  • WS/LRC (Word Select/Left-Right Clock): GPIO 9
  • SD (Serial Data): GPIO 10

Getting Started

  1. Connect the I2S microphone to the ESP32 according to the pin configuration
  2. Build and flash the project to your ESP32
  3. Open a Serial monitor at 115200 baud
  4. Follow the calibration process on first run

Serial Commands

The system can be controlled via Serial commands:

  • h - Display help menu
  • c - Start calibration process
  • + - Increase sensitivity (threshold up)
  • - - Decrease sensitivity (threshold down)
  • s - Toggle spectrum visualization

Configuration Options

All system parameters can be adjusted in Config.h:

Audio Processing

  • SAMPLE_RATE: 8000 Hz (good for frequencies up to 4kHz)
  • BITS_PER_SAMPLE: 16-bit resolution
  • SAMPLE_BUFFER_SIZE: 1024 samples
  • FFT_SIZE: 1024 points

Note Detection

  • NOTE_FREQ_C2: 65.41 Hz (lowest detectable note)
  • NOTE_FREQ_C6: 1046.50 Hz (highest detectable note)
  • FREQUENCY_TOLERANCE: 3.0 Hz
  • MAX_SIMULTANEOUS_NOTES: 7
  • MIN_NOTE_DURATION_MS: 50ms
  • NOTE_RELEASE_TIME_MS: 100ms

Calibration

  • CALIBRATION_DURATION_MS: 5000ms
  • CALIBRATION_PEAK_PERCENTILE: 0.95 (95th percentile)

Visualization

The system provides two visualization modes:

  1. Note Display:
Current Notes:
A4 (440.0 Hz, Magnitude: 2500, Duration: 250ms)
E5 (659.3 Hz, Magnitude: 1800, Duration: 150ms)
  1. Spectrum Display (when enabled):
Frequency Spectrum:
0Hz    |▄▄▄▄▄
100Hz  |██████▄
200Hz  |▄▄▄
...

Performance Tuning

  1. Start with calibration by pressing 'c' in a quiet environment
  2. Play notes and observe the detection accuracy
  3. Use '+' and '-' to adjust sensitivity if needed
  4. Enable spectrum display with 's' to visualize frequency content
  5. Adjust Config.h parameters if needed for your specific setup

Implementation Details

  • Uses FFT for frequency analysis
  • Implements peak detection with dynamic thresholding
  • Filters out harmonics to prevent duplicate detections
  • Tracks note timing and duration
  • Uses ring buffer for real-time processing
  • Calibration collects ambient noise profile

Troubleshooting

  1. No notes detected:

    • Check microphone connection
    • Run calibration
    • Increase sensitivity with '+'
    • Verify audio input level in spectrum display
  2. False detections:

    • Run calibration in a quiet environment
    • Decrease sensitivity with '-'
    • Adjust PEAK_RATIO_THRESHOLD in Config.h
  3. Missing notes:

    • Check if notes are within C2-C6 range
    • Increase FREQUENCY_TOLERANCE
    • Decrease MIN_MAGNITUDE_THRESHOLD

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Development Environment Setup

Prerequisites

  • PlatformIO IDE (recommended) or Arduino IDE
  • ESP32 board support package
  • Required libraries:
    • arduino-audio-tools
    • arduino-audio-driver
    • WiFiManager
    • AsyncTCP
    • ESPAsyncWebServer
    • arduinoFFT

Building with PlatformIO

  1. Clone the repository
  2. Open the project in PlatformIO
  3. Install dependencies:
    pio lib install
    
  4. Build and upload:
    pio run -t upload
    

Memory Management

Memory Usage

  • Program Memory: ~800KB
  • RAM Usage: ~100KB
  • DMA Buffers: 4 x 512 bytes
  • FFT Working Buffer: 2048 bytes (1024 samples x 2 bytes)

Optimization Tips

  • Adjust DMA_BUFFER_COUNT based on available RAM
  • Reduce SAMPLE_BUFFER_SIZE for lower latency
  • Use PSRAM if available for larger buffer sizes

Advanced Configuration

Task Management

  • Audio processing task on Core 1:
    • I2S sample reading
    • Audio level tracking
    • Note detection and FFT analysis
  • Visualization task on Core 0:
    • WebSocket communication
    • Spectrum visualization
    • Serial interface
    • Network operations
  • Inter-core communication via FreeRTOS queue
  • Configurable priorities in Config.h

Audio Pipeline

  1. I2S DMA Input
  2. Sample Buffer Collection
  3. FFT Processing
  4. Peak Detection
  5. Note Identification
  6. Output Generation

Timing Parameters

  • Audio Buffer Processing: ~8ms
  • FFT Computation: ~5ms
  • Note Detection: ~2ms
  • Total Latency: ~15-20ms

Performance Optimization

CPU Usage

  • Core 1 (Audio Processing):
    • I2S DMA handling: ~15%
    • Audio analysis: ~20%
    • FFT processing: ~15%
  • Core 0 (Visualization):
    • WebSocket updates: ~5%
    • Visualization: ~5%
    • Network handling: ~5%

Memory Optimization

  1. Buffer Size Selection:
    • Larger buffers: Better frequency resolution
    • Smaller buffers: Lower latency
  2. DMA Configuration:
    • More buffers: Better continuity
    • Fewer buffers: Lower memory usage

Frequency Analysis

  • FFT Resolution: 7.8125 Hz (8000/1024)
  • Frequency Bins: 512 (Nyquist limit)
  • Useful Range: 65.41 Hz to 1046.50 Hz
  • Window Function: Hamming

Technical Details

Microphone Specifications

  • Supply Voltage: 3.3V
  • Sampling Rate: 8kHz
  • Bit Depth: 16-bit
  • SNR: >65dB (typical)

Signal Processing

  1. Pre-processing:
    • DC offset removal
    • Windowing function application
  2. FFT Processing:
    • 1024-point real FFT
    • Magnitude calculation
  3. Post-processing:
    • Peak detection
    • Harmonic filtering
    • Note matching

Calibration Process

  1. Ambient Noise Collection (5 seconds)
  2. Frequency Bin Analysis
  3. Threshold Calculation:
    • Base threshold from 95th percentile
    • Per-bin noise floor mapping
  4. Dynamic Adjustment

Error Handling

Common Issues

  1. I2S Communication Errors:
    • Check pin connections
    • Verify I2S configuration
    • Monitor serial output for error codes
  2. Memory Issues:
    • Watch heap fragmentation
    • Monitor stack usage
    • Check DMA buffer allocation

Error Recovery

  • Automatic I2S reset on communication errors
  • Dynamic threshold adjustment
  • Watchdog timer protection

Project Structure

Core Components

  1. AudioLevelTracker
    • Real-time audio level monitoring
    • Peak detection
    • Threshold management
  2. NoteDetector
    • Frequency analysis
    • Note identification
    • Harmonic filtering
  3. SpectrumVisualizer
    • Real-time spectrum display
    • Magnitude scaling
    • ASCII visualization

File Organization

  • /src: Core implementation files
  • /include: Header files and configurations
  • /data: Additional resources
  • /test: Unit tests

Inter-Core Communication

Queue Management

  • FreeRTOS queue for audio data transfer
  • 4-slot queue buffer
  • Zero-copy data passing
  • Non-blocking queue operations
  • Automatic overflow protection

Data Flow

  1. Core 1 (Audio Task):
    • Processes audio samples
    • Performs FFT analysis
    • Queues processed data
  2. Core 0 (Visualization Task):
    • Receives processed data
    • Updates visualization
    • Handles network communication

Network Communication

  • Asynchronous WebSocket updates
  • JSON-formatted spectrum data
  • Configurable update rate (50ms default)
  • Automatic client cleanup
  • Efficient connection management
Description
No description provided
Readme 85 KiB
Languages
C++ 100%