Skip to main content
Back to Projects

Voice Processor

Audio Pipeline for Content Production

A Python toolkit for professional audio processing including time-stretching, pitch preservation, and format conversion. Optimized for podcast and video production workflows.

01

Problem

Content creators need consistent audio processing across multiple files, but professional tools are expensive and consumer tools lack batch processing capabilities. Manual processing in Audacity or Premiere is repetitive and error-prone.

Expensive Tools

Professional audio suites cost $200+/year with features creators don't need

Repetitive Tasks

Manual processing of each file in Audacity or Premiere is time-consuming

Inconsistent Output

No easy way to apply identical settings across batch files

02

Architecture

Input LayerMulti-format SupportSample Rate DetectionBatch QueueProcessinglibrosa + Phase VocoderTime-Stretch / PitchNormalizationConfigurationYAML ProfilesEnv-based SettingsOutput LayerFormat ConversionMetadata HandlingQuality MetricsSourceWAV / MP3 / FLACCLI / Library APIAudio Filesnumpy ArraysSettingsProcessedMetrics

The processing pipeline follows a modular design: input files are validated and normalized, then passed through configurable processing stages. Each stage operates on numpy arrays with librosa for high-quality spectral processing. Output is formatted according to environment profiles for different production contexts.

Input Layer

  • Multi-format support (WAV, MP3, FLAC, OGG)
  • Automatic sample rate detection
  • Audio validation and integrity checks
  • Batch file queue management

Processing Layer

  • librosa for spectral analysis
  • Phase vocoder for time-stretching
  • Pitch preservation algorithms
  • Noise gate and normalization

Configuration Layer

  • YAML-based profile definitions
  • Environment-specific settings
  • Override flags for one-off processing
  • Default fallback configuration

Output Layer

  • Format conversion (PCM, MP3, AAC)
  • Metadata preservation
  • Quality metrics logging
  • Organized directory structure
03

Technical Approach

Designed a configuration-driven pipeline using librosa and soundfile for high-quality audio processing. Implemented time-stretching algorithms with optional pitch preservation. Created environment-based profiles (dev/prod) for different processing requirements. Built a clean Python API that integrates into existing workflows.

voiceprocessor/pipeline.py
import librosa
import soundfile as sf
import numpy as np
from dataclasses import dataclass

@dataclass
class ProcessingProfile:
    """Configuration for audio processing"""
    target_sr: int = 48000
    time_stretch: float = 1.0
    preserve_pitch: bool = True
    normalize: bool = True
    output_format: str = "wav"

class AudioPipeline:
    def __init__(self, profile: ProcessingProfile):
        self.profile = profile
    
    def process(self, input_path: str, output_path: str) -> dict:
        # Load audio
        y, sr = librosa.load(input_path, sr=None)
        
        # Time stretching with pitch preservation
        if self.profile.time_stretch != 1.0:
            if self.profile.preserve_pitch:
                y = librosa.effects.time_stretch(y, rate=self.profile.time_stretch)
            else:
                # Simple resampling approach
                y = librosa.resample(y, orig_sr=sr, 
                    target_sr=int(sr / self.profile.time_stretch))
        
        # Normalization
        if self.profile.normalize:
            y = librosa.util.normalize(y)
        
        # Export
        sf.write(output_path, y, self.profile.target_sr)
        
        return {
            'input_duration': len(y) / sr,
            'output_duration': len(y) / self.profile.target_sr,
            'peak_db': 20 * np.log10(np.max(np.abs(y)))
        }

Key Technical Decisions

librosa over pydub for Spectral Processing

Chose librosa for its superior phase vocoder implementation, enabling high-quality time-stretching with minimal artifacts. pydub remains available for simple format conversions.

Configuration-Driven Profiles

Implemented YAML-based profiles for dev/test/prod environments. This allows the same codebase to handle quick preview renders in development and full-quality production exports.

Dataclass-Based Pipeline State

Used Python dataclasses for immutable processing profiles. This prevents accidental state mutation during batch processing and enables easy serialization for logging.

04

Tech Stack

Python 3.12
librosa
soundfile
numpy
pydub
pytest
PyYAML
python-dotenv
05

Results & Outcomes

01

Batch processing of 100+ files with consistent settings

02

Time-stretching with <2% quality loss at 1.5x speed

03

Configurable profiles for different output formats

04

100% test coverage on core processing functions

05

CLI and library interfaces for flexibility

<2%
Quality Loss
100+
Files/Batch
100%
Test Coverage
1.5x
Max Speed
06

Live Demo

Explore the interactive dashboard showing batch processing progress, quality metrics, and audio analysis results. This is a read-only demonstration of the processing pipeline interface.

Open Dashboard Demo