Rakam-AI / rakam_systems
☆11Updated this week
Related projects ⓘ
Alternatives and complementary repositories for rakam_systems
- Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX☆13Updated last week
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorch☆16Updated this week
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆33Updated 2 weeks ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆45Updated this week
- Collection of scripts from mHuBERT-147.☆22Updated 4 months ago
- Audio tokenization, in the fastest way possible!☆45Updated 2 months ago
- babyLM WhisBERT code☆17Updated 5 months ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated last year
- ☆61Updated 3 months ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆34Updated last year
- ☆9Updated last month
- [ICASSP 2023] Tempo vs. Pitch: understanding self-supervised tempo estimation☆13Updated last year
- ☆84Updated 7 months ago
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆40Updated last month
- ☆52Updated 2 weeks ago
- The demo page of UniAudio☆34Updated 9 months ago
- Supervoice diffusion enhance☆25Updated 3 months ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆28Updated 3 weeks ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- GPT for FACodec☆13Updated 7 months ago
- ☆23Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆77Updated 3 months ago
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆15Updated 3 months ago
- ☆59Updated last year
- Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.☆11Updated 11 months ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆17Updated 3 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆19Updated 2 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆12Updated 5 months ago
- GlotCC Dataset and Pipline -- NeurIPS 2024☆16Updated last week
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated last year