AI semantic video editor · open source · 100% Free

Edit by listening,
not by scrubbing.

Cadence Lab drops your OBS recording into an AI pipeline: it transcribes the audio, asks Claude to judge every pause and filler word in context, and renders a tight, YouTube-ready cut. Then you just ask it to refine the edit.

View on GitHub See how it works

100% FREENo subscription, ever
Open sourceMIT licensed on GitHub
Local-firstYour footage stays put

Free & open source · run it from source on macOS & Linux · you bring your own API keys (~$0.60–$2 / video)

my-channel · ep 12 · 18:42

MIC

“um” · filler · cut breath · trim 150ms retake · drop take 1

remove every sniffle and pull a 60-second highlight from the demo

Ask Cadence Found 7 sniffles and a demo segment at 14:02. Proposing 7 cuts + 1 highlight clip. Apply allReview

Powered by
Whisper large-v3
/
Claude Opus
/
FFmpeg
/
CLIP
/
DeepFilterNet
/
Tauri

Why it's different

Most “auto-edit” tools are just a regex over the waveform.

They cut anything below −30 dB and call it done — butchering intentional pauses, flattening breaths into something robotic, and keeping both takes when you flub a line. Cadence Lab reads the transcript, not just the volume, so it edits like a human who was actually paying attention.

Features

Editing decisions made by something that understands the words.

Pauses, classified seven ways

Every gap is labeled filler, hesitation, breath, emphasis, pre-laughter, transition or listening — each with its own behavior. Breaths get trimmed to 150 ms, not deleted. That's the line between natural and robotic.

Context-aware filler removal

“Like” used as filler gets cut. “Nothing else like it” gets kept. The classifier reads the surrounding words — no blunt keyword deletion.

Retake detection

Flub a sentence and start over? Cadence spots the second attempt — or a “let me try that again” — and flags the worse take, the way an editor would.

Ask Cadence

“Cut every sniffle.” “Find when the walnut table is on screen.” “Pull a 60-second highlight.” Claude executes against your video and proposes edits — you accept each with one click.

Semantic visual search

CLIP frame embeddings indexed at 1 fps. Ask “find the part where the dog appears” and get ranked timestamps — without scrubbing the whole tape.

Sound-event cleanup

An AudioSet-trained model spots sniffles, coughs, throat-clears and sneezes. Pair it with “remove all sniffles” and Cadence proposes a precise cut for each.

Neural denoise

DeepFilterNet — trained on ~100k hours of speech — clears fan hum, keyboard clicks and room noise far better than a classical filter, near real-time on CPU.

Splicing timeline

Extract highlight clips, rearrange them, drop black between, and render an assembled cut — a dedicated path separate from the main pacing edit.

Hardware-fast render

Hardware H.264 by default — 5–15× faster than a CPU encode YouTube can't tell apart — with an opt-in archival libx264 mode when you want the master.

How it works

A typed pipeline, from raw recording to finished cut.

Each stage writes a structured file the next one reads. Stop anywhere, tweak, and resume.

Ingest & transcribe

Cadence extracts your mic track alone, then transcribes with Whisper large-v3 (~30× realtime via Groq, or fully local). Word-level timestamps; desktop audio never masks the speech.

Classify in context

A single Claude Opus call reads the whole transcript and judges every pause and filler candidate with schema-constrained output — plus detected retakes. No regex, no fragile parsing.

Plan the cuts

Pure interval algebra turns those decisions — plus any cuts you or Ask Cadence add — into clean keep-segments. Fully local, deterministic, with an audit log of the original intent.

Review & render

Listen to a 3-second clip around each cut, override with a click, re-plan instantly. Then FFmpeg renders a frame-aligned, loudness-normalized MP4 — ready to upload.

Ask Cadence

Talk to your timeline.

A dedicated Claude agent sits on top of your edit with read tools and action tools. Ask in plain English; it queries your transcript, audio events and visual index, then proposes typed edits you apply one at a time. Kick off a long scan and it picks the conversation back up the moment it finishes.

remove the um at 1:23
cut every sniffle in the intro
find when the walnut table is on screen
pull a 60-second highlight from the demo

Every action is a proposal. Nothing touches your video until you click Apply.

cut every sniffle in the intro

Ask CadenceScanning for non-speech events… found 4 sniffles between 0:00–2:10. Proposing a custom cut for each.

also find where I show the table

Ask CadenceIndexing frames… best match for “table on screen” is 14:02–14:48. Want a highlight clip from that range?Create clipNot now

Open source

Yours to run, read, fork and ship.

MIT licensed. Your media and API keys stay on your machine — there's no Cadence Lab cloud in the loop.

Local-first & private

Everything runs on your computer. Bring your own Groq and Anthropic keys, or go fully offline with local Whisper. Your footage never leaves the device unless you pick a cloud transcription backend.

Honest pricing

The app is free. You pay your own model costs — about $0.60–$2.00 for a 30-minute video, mostly transcription and one big classification call. No subscription, no markup.

A real reference build

Structured outputs, prompt caching, agentic tool use, a typed multi-stage data contract, a Tauri + React shell over a Python FastAPI sidecar. Read the code — it's a working example of all of it.

View the source on GitHub →

Get the source

Run it from source today.

Cadence Lab is in early open-source development. Clone the repo and run it from the command line — one-click installers are on the roadmap.

Build from source

Clone & run from the CLI

→

Star on GitHub

Follow development

→

macOS · quick start

# 1 — prerequisites
brew install ffmpeg uv

# 2 — clone & run
git clone https://github.com/JosephLeon/Cadence-Lab
cd Cadence-Lab
uv sync && cp .env.example .env
uv run cadence-lab server

Pre-release. macOS & Windows installers are on the roadmap (donations welcome). Until then, developers can clone the repo and run from source; non-developers, hang tight.

Full setup — API keys, the desktop app, all of it — is in the install guide. Requires ffmpeg and your own API keys.

Edit by listening,not by scrubbing.

Most “auto-edit” tools are just a regex over the waveform.

Editing decisions made by something that understands the words.

Pauses, classified seven ways

Context-aware filler removal

Retake detection

Ask Cadence

Semantic visual search

Sound-event cleanup

Neural denoise

Splicing timeline

Hardware-fast render

A typed pipeline, from raw recording to finished cut.

Ingest & transcribe

Classify in context

Plan the cuts

Review & render

Talk to your timeline.

Yours to run, read, fork and ship.

Local-first & private

Honest pricing

A real reference build

Run it from source today.

Build from source

Star on GitHub

Edit by listening,
not by scrubbing.