Ai on Victor Salles

Building an Offline Voice System for macOS

Sun, 22 Mar 2026 00:00:00 +0000

How I built a fully offline text-to-speech and speech-to-text system for macOS using Kokoro-82M and mlx-whisper — no cloud APIs, streaming audio in under a second.

Multi-Agent Dev Team

Tue, 10 Mar 2026 00:00:00 +0000

Structured AI agent teams with memory and coordination

Most people interact with AI assistants one prompt at a time. I wanted to see what happens when you design a full development team of AI agents — each with a defined role, personality, diagnostic methodology, and persistent memory — and have them collaborate on real engineering work.

The setup:

Five specialized agents, each with a distinct archetype and working style:

Leader — Delegation, scoping, planning. Operates through a 5-field briefing format and dual quality gates.
Backend — Django/DRF/Postgres specialist with a diagnostic loop that has a hard stop: after 2 failed hypotheses, escalate instead of brute-forcing.
Frontend — Next.js/TypeScript, 4-states rendering rule, and a QA checklist that runs before any PR.
Designer — Full state specifications and a feasibility gate that forces consideration of edge cases before committing to designs.
Security — 7-category Django-aware review that runs after every Backend or Frontend task, before the second quality gate.

Memory architecture:

Daily AI Briefs

Sun, 01 Mar 2026 00:00:00 +0000

Automated daily summaries of AI news and developments, generated by AI agents and delivered on schedule.

Voice Automation

Sun, 01 Mar 2026 00:00:00 +0000

Offline voice I/O for macOS

A complete text-to-speech and speech-to-text system for macOS that runs entirely on-device. No cloud APIs, no subscriptions — just local AI models doing real work.

The problem: I spend hours reading and writing text on screen. I wanted a way to have my Mac read anything to me with a single hotkey, and transcribe audio without sending data to external servers.

What it does:

Press ⌥S and whatever text you’ve selected (or copied) gets read aloud using Kokoro-82M, an 82-million parameter TTS model running locally on Apple Silicon. The system automatically detects whether the text is Portuguese or English and picks the right voice. Audio starts streaming in under a second — no temp files, no waiting for the full synthesis to finish.