# The Voice AI Index

> The living index of voice & speech AI tooling — text-to-speech, speech recognition,
> voice cloning & conversion, realtime voice agents, toolkits — ranked daily by GitHub momentum.

Updated: 2026-06-13T11:26:37.602843+00:00
Tools indexed: 297

## Top voice & speech tools by momentum

- [ggml-org/whisper.cpp](https://github.com/ggml-org/whisper.cpp) — momentum 83, ⭐50683 — Speech-to-Text — Port of OpenAI's Whisper model in C/C++
- [fishaudio/fish-speech](https://github.com/fishaudio/fish-speech) — momentum 81, ⭐30792 — Text-to-Speech — SOTA Open Source TTS
- [OpenBMB/VoxCPM](https://github.com/OpenBMB/VoxCPM) — momentum 80, ⭐28747 — Text-to-Speech — VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-L
- [cjpais/Handy](https://github.com/cjpais/Handy) — momentum 80, ⭐23611 — Speech-to-Text — A free, open source, and extensible speech-to-text application that works completely offline.
- [debpalash/OmniVoice-Studio](https://github.com/debpalash/OmniVoice-Studio) — momentum 80, ⭐6909 — Voice Cloning & Conversion — The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictatio
- [MisoLabsAI/MisoTTS](https://github.com/MisoLabsAI/MisoTTS) — momentum 80, ⭐2770 — Text-to-Speech — Miso TTS is an 8 billion, highly emotive text-to-speech model
- [index-tts/index-tts](https://github.com/index-tts/index-tts) — momentum 79, ⭐21102 — Text-to-Speech — An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
- [DrewThomasson/ebook2audiobook](https://github.com/DrewThomasson/ebook2audiobook) — momentum 79, ⭐19255 — Voice Cloning & Conversion — Generate audiobooks from e-books, voice cloning & 1158+ languages!
- [Huanshere/VideoLingo](https://github.com/Huanshere/VideoLingo) — momentum 79, ⭐17450 — Speech-to-Text — Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated
- [k2-fsa/OmniVoice](https://github.com/k2-fsa/OmniVoice) — momentum 79, ⭐7377 — Voice Cloning & Conversion — High-Quality Voice Cloning TTS for 600+ Languages
- [m-bain/whisperX](https://github.com/m-bain/whisperX) — momentum 78, ⭐22450 — Enhancement & Analysis — WhisperX:  Automatic Speech Recognition with Word-level Timestamps (& Diarization)
- [jianchang512/pyvideotrans](https://github.com/jianchang512/pyvideotrans) — momentum 78, ⭐17934 — Speech-to-Text — Translate the video from one language to another and embed dubbing & subtitles.
- [modelscope/FunASR](https://github.com/modelscope/FunASR) — momentum 78, ⭐17924 — Toolkits & Frameworks — Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emot
- [NVIDIA-NeMo/NeMo](https://github.com/NVIDIA-NeMo/NeMo) — momentum 78, ⭐17368 — Toolkits & Frameworks — A scalable generative AI framework built for researchers and developers working on Large Language Mo
- [openai/whisper](https://github.com/openai/whisper) — momentum 77, ⭐102609 — Speech-to-Text — Robust Speech Recognition via Large-Scale Weak Supervision
- [leon-ai/leon](https://github.com/leon-ai/leon) — momentum 77, ⭐17309 — Voice Agents & Realtime — 🧠 Leon is your open-source personal assistant.
- [k2-fsa/sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) — momentum 77, ⭐12952 — Toolkits & Frameworks — Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD 
- [pipecat-ai/pipecat](https://github.com/pipecat-ai/pipecat) — momentum 77, ⭐12803 — Toolkits & Frameworks — Open Source framework for voice and multimodal conversational AI
- [PaddlePaddle/PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) — momentum 77, ⭐12613 — Toolkits & Frameworks — Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctua
- [Open-Less/openless](https://github.com/Open-Less/openless) — momentum 77, ⭐2297 — Speech-to-Text — Hold a key, speak, release — AI-polished text appears at your cursor in any app. Open-source voice i
- [RVC-Boss/GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS) — momentum 76, ⭐58637 — Voice Cloning & Conversion — 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
- [FunAudioLLM/CosyVoice](https://github.com/FunAudioLLM/CosyVoice) — momentum 76, ⭐21628 — Text-to-Speech — Multi-lingual large voice generation model, providing inference, training and deployment full-stack 
- [alphacep/vosk-api](https://github.com/alphacep/vosk-api) — momentum 76, ⭐14845 — Toolkits & Frameworks — Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and 
- [Zackriya-Solutions/meetily](https://github.com/Zackriya-Solutions/meetily) — momentum 76, ⭐12710 — Enhancement & Analysis — Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diar
- [livekit/agents](https://github.com/livekit/agents) — momentum 76, ⭐10962 — Toolkits & Frameworks — A framework for building realtime voice AI agents 🤖🎙️📹
- [TEN-framework/ten-framework](https://github.com/TEN-framework/ten-framework) — momentum 76, ⭐10669 — Toolkits & Frameworks — Open-source framework for conversational voice AI agents
- [QuentinFuxa/WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) — momentum 76, ⭐10446 — Voice Agents & Realtime — Simultaneous speech-to-text models
- [pyannote/pyannote-audio](https://github.com/pyannote/pyannote-audio) — momentum 76, ⭐10113 — Enhancement & Analysis — Neural building blocks for speaker diarization: speech activity detection, speaker change detection,
- [KoljaB/RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) — momentum 76, ⭐9896 — Voice Agents & Realtime — A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake
- [espnet/espnet](https://github.com/espnet/espnet) — momentum 76, ⭐9859 — Toolkits & Frameworks — End-to-End Speech Processing Toolkit
- [krillinai/KrillinAI](https://github.com/krillinai/KrillinAI) — momentum 75, ⭐10285 — Voice Agents & Realtime — AI video translation & dubbing tool for humans and AI Agents, powered by LLMs. Full pipeline: downlo
- [voicepaw/so-vits-svc-fork](https://github.com/voicepaw/so-vits-svc-fork) — momentum 75, ⭐9309 — Voice Cloning & Conversion — so-vits-svc fork with realtime support, improved interface and more features.
- [Uberi/speech_recognition](https://github.com/Uberi/speech_recognition) — momentum 75, ⭐8970 — Speech-to-Text — Speech recognition module for Python, supporting several engines and APIs, online and offline.
- [GetStream/Vision-Agents](https://github.com/GetStream/Vision-Agents) — momentum 75, ⭐7918 — Voice Agents & Realtime — Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider
- [speechbrain/speechbrain](https://github.com/speechbrain/speechbrain) — momentum 74, ⭐11615 — Toolkits & Frameworks — A PyTorch-based Speech Toolkit
- [fishaudio/Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) — momentum 74, ⭐8762 — Voice Agents & Realtime — vits2 backbone with multilingual-bert
- [FunAudioLLM/SenseVoice](https://github.com/FunAudioLLM/SenseVoice) — momentum 74, ⭐8554 — Enhancement & Analysis — Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages,
- [TalAter/annyang](https://github.com/TalAter/annyang) — momentum 74, ⭐6814 — Speech-to-Text — 💬 Speech recognition for your site
- [jamiepine/voicebox](https://github.com/jamiepine/voicebox) — momentum 73, ⭐29861 — Speech-to-Text — The open-source AI voice studio. Clone, dictate, create.
- [supertone-inc/supertonic](https://github.com/supertone-inc/supertonic) — momentum 73, ⭐11599 — Text-to-Speech — Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.