Audio Operations

extract_audio

Extract audio track from a video file.

{ "type": "audio_extract", "params": { "format": "mp3", "bitrate": "192k" } }

normalize_audio

Normalize audio loudness to broadcast standards (EBU R128).

{ "type": "audio_normalize", "params": { "target_lufs": -14 } }

noise_remove

Remove background noise from audio.

{ "type": "audio_noise-remove" }

fade_audio

Add fade in/out effects to audio.

{ "type": "audio_fade", "params": { "fade_in": 2.0, "fade_out": 3.0 } }

mix_audio

Mix multiple audio tracks together. Use an array of URLs as input.

{
  "input": ["https://example.com/voice.mp3", "https://example.com/music.mp3"],
  "operations": [{ "type": "audio_mix", "params": { "volumes": [1.0, 0.3] } }]
}

pitch_shift

Shift audio pitch up or down.

{ "type": "audio_pitch-shift", "params": { "semitones": 2 } }

loudness_analyze

Analyze audio loudness levels (LUFS, peak, dynamic range).

{ "type": "audio_loudness-analyze" }

Returns loudness metrics in the output without modifying the file.

transcribe

Transcribe speech to text using OpenAI Whisper.

{ "type": "audio_transcribe", "params": { "language": "en" } }

Param	Type	Description
language	string	Language code (en, es, fr, etc.) or “auto”
format	string	Output format: text, srt, vtt, json

waveform

Generate a visual waveform image from audio.

{ "type": "audio_waveform", "params": { "width": 1920, "height": 200, "color": "0x00ff00" } }

​extract_audio

​normalize_audio

​noise_remove

​fade_audio

​mix_audio

​pitch_shift

​loudness_analyze

​transcribe

​waveform