Voice Processor

Audio Pipeline for Content Production

A Python toolkit for professional audio processing including time-stretching, pitch preservation, and format conversion. Optimized for podcast and video production workflows.

View Source Live Demo

Problem

Content creators need consistent audio processing across multiple files, but professional tools are expensive and consumer tools lack batch processing capabilities. Manual processing in Audacity or Premiere is repetitive and error-prone.

Expensive Tools

Professional audio suites cost $200+/year with features creators don't need

Repetitive Tasks

Manual processing of each file in Audacity or Premiere is time-consuming

Inconsistent Output

No easy way to apply identical settings across batch files

Architecture

The processing pipeline follows a modular design: input files are validated and normalized, then passed through configurable processing stages. Each stage operates on numpy arrays with librosa for high-quality spectral processing. Output is formatted according to environment profiles for different production contexts.

Input Layer

•Multi-format support (WAV, MP3, FLAC, OGG)
•Automatic sample rate detection
•Audio validation and integrity checks
•Batch file queue management

Processing Layer

•librosa for spectral analysis
•Phase vocoder for time-stretching
•Pitch preservation algorithms
•Noise gate and normalization

Configuration Layer

•YAML-based profile definitions
•Environment-specific settings
•Override flags for one-off processing
•Default fallback configuration

Output Layer

•Format conversion (PCM, MP3, AAC)
•Metadata preservation
•Quality metrics logging
•Organized directory structure

Technical Approach

Designed a configuration-driven pipeline using librosa and soundfile for high-quality audio processing. Implemented time-stretching algorithms with optional pitch preservation. Created environment-based profiles (dev/prod) for different processing requirements. Built a clean Python API that integrates into existing workflows.

voiceprocessor/pipeline.py

import librosa
import soundfile as sf
import numpy as np
from dataclasses import dataclass

@dataclass
class ProcessingProfile:
    """Configuration for audio processing"""
    target_sr: int = 48000
    time_stretch: float = 1.0
    preserve_pitch: bool = True
    normalize: bool = True
    output_format: str = "wav"

class AudioPipeline:
    def __init__(self, profile: ProcessingProfile):
        self.profile = profile
    
    def process(self, input_path: str, output_path: str) -> dict:
        # Load audio
        y, sr = librosa.load(input_path, sr=None)
        
        # Time stretching with pitch preservation
        if self.profile.time_stretch != 1.0:
            if self.profile.preserve_pitch:
                y = librosa.effects.time_stretch(y, rate=self.profile.time_stretch)
            else:
                # Simple resampling approach
                y = librosa.resample(y, orig_sr=sr, 
                    target_sr=int(sr / self.profile.time_stretch))
        
        # Normalization
        if self.profile.normalize:
            y = librosa.util.normalize(y)
        
        # Export
        sf.write(output_path, y, self.profile.target_sr)
        
        return {
            'input_duration': len(y) / sr,
            'output_duration': len(y) / self.profile.target_sr,
            'peak_db': 20 * np.log10(np.max(np.abs(y)))
        }

Key Technical Decisions

librosa over pydub for Spectral Processing

Chose librosa for its superior phase vocoder implementation, enabling high-quality time-stretching with minimal artifacts. pydub remains available for simple format conversions.

Configuration-Driven Profiles

Implemented YAML-based profiles for dev/test/prod environments. This allows the same codebase to handle quick preview renders in development and full-quality production exports.

Dataclass-Based Pipeline State

Used Python dataclasses for immutable processing profiles. This prevents accidental state mutation during batch processing and enables easy serialization for logging.

Tech Stack

Python 3.12

librosa

soundfile

numpy

pydub

pytest

PyYAML

python-dotenv

Results & Outcomes

Batch processing of 100+ files with consistent settings

Time-stretching with <2% quality loss at 1.5x speed

Configurable profiles for different output formats

100% test coverage on core processing functions

CLI and library interfaces for flexibility

<2%

Quality Loss

100+

Files/Batch

100%

Test Coverage

1.5x

Max Speed

Live Demo

Explore the interactive dashboard showing batch processing progress, quality metrics, and audio analysis results. This is a read-only demonstration of the processing pipeline interface.

Open Dashboard Demo