austin-bowen / voicebox
Python text-to-speech library with built-in voice effects and support for multiple TTS engines
☆17Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for voicebox
- Fine tuning Mistral-7b with PEFT(Parameter Efficient Fine-Tuning) and LoRA(Low-Rank Adaptation) on Puffin Dataset(multi-turn conversation…☆12Updated 11 months ago
- Open Server is an OpenAI API Compatible Server for generating text, images, embeddings, and storing them in vector databases. It also inc…☆15Updated 11 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆11Updated last month
- An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …☆20Updated 2 months ago
- Supervoice Speaker Separation Network☆13Updated 5 months ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated last year
- Speech to Speech conversation using the OpenAI RealTime API in Python 🐍☆19Updated this week
- Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port…☆12Updated 2 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆22Updated this week
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆45Updated 2 weeks ago
- ☆11Updated 6 months ago
- HuggingChat like UI in Gradio☆65Updated last year
- Jupyter Notebooks and an R Notebook for encoding Pokémon embeddings and creating data visualizations.☆16Updated 4 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆14Updated 9 months ago
- Rust bindings for CTranslate2☆13Updated last year
- One Line To Build Zero-Data Classifiers in Minutes☆33Updated last month
- Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning …☆24Updated last year
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆19Updated 3 months ago
- This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…☆17Updated last year
- ☆20Updated 9 months ago
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆13Updated 11 months ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 3 months ago
- ViSpeR: Multilingual Audio-Visual Speech Recognition☆26Updated 5 months ago
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆36Updated 9 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆33Updated last year
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆45Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 2 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago