voidful/MMLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/voidful/MMLM)

voidful / MMLM

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra

☆16

Alternatives and similar repositories for MMLM

Users that are interested in MMLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ntucllab / CLImage_Dataset
View on GitHub
The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper
☆17May 11, 2026Updated 2 months ago
smatthewenglish / trst
View on GitHub
☆12Jan 15, 2015Updated 11 years ago
nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
TMMMU-Benchmark / evaluation
View on GitHub
Evaluation code for benchmarking VLMs in traditional chinese understanding
☆14Dec 22, 2025Updated 6 months ago
kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
andybi7676 / reborn-uasr
View on GitHub
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
☆15Dec 11, 2024Updated last year
lwang114 / GraphUnsupASR
View on GitHub
☆10Apr 17, 2024Updated 2 years ago
hhhaaahhhaa / ASR-TTA
View on GitHub
☆16Nov 4, 2025Updated 8 months ago
ga642381 / Taiwanese-Speech-Synthesis
View on GitHub
Taiwanese Speech Synthesis with Tacotron2
☆26Oct 2, 2022Updated 3 years ago
DanielLin94144 / DUAL-textless-SQA
View on GitHub
Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…
☆35Aug 10, 2023Updated 2 years ago
voidful / asrp
View on GitHub
ASR text preprocessing utility
☆21Aug 5, 2024Updated last year
George0828Zhang / torch_cif
View on GitHub
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/ab…
☆37Feb 10, 2024Updated 2 years ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
bartlomiej-pluta / android-tts-server
View on GitHub
The Android application providing user with REST-based interface for utilizing built-in Android's TTS engine. The web service is highly c…
☆11Jul 28, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
UDICatNCHU / QAbot_Scoring
View on GitHub
問答機器人評分系統
☆11Dec 4, 2022Updated 3 years ago
ga642381 / FlappyBird
View on GitHub
Super Flappy Bird in p5.js
☆10Mar 8, 2021Updated 5 years ago
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
Speech-Lab-IITM / CCC-wav2vec-2.0
View on GitHub
Code for the method proposed in the paper:- ccc-wav2vec 2.0: Clustering aided Cross-Contrastive learning of Self-Supervised speech repres…
☆23Mar 18, 2024Updated 2 years ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆30Mar 24, 2026Updated 3 months ago
MingLunHan / CIF-ColDec
View on GitHub
[ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
☆25Updated this week
voidful / vall-e-encodec
View on GitHub
☆41May 15, 2023Updated 3 years ago
ga642381 / Taiwanese-Translation
View on GitHub
Taiwanese Translation with BERT based model and RNN. Collection of Taiwanese text corpus
☆13Oct 15, 2022Updated 3 years ago
ckyang1124 / SAKURA
View on GitHub
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…
☆24Aug 14, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
jeffeuxMartin / meta-learning-hlp
View on GitHub
A publishing website of a table collecting meta-learning-related papers in the area of human language processing.
☆17Aug 2, 2021Updated 4 years ago
Roytsai27 / GIRCSE
View on GitHub
Official implementation of ICLR 2026: Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
☆15May 24, 2026Updated last month
grtzsohalf / buy_vs_rent_and_invest
View on GitHub
☆15Sep 9, 2021Updated 4 years ago
KotRikD / romajitable
View on GitHub
Convert english/translit words to katakana
☆13Sep 1, 2018Updated 7 years ago
shkim816 / acnn_speaker_recog
View on GitHub
acnn for text-independent speaker recognition
☆10Feb 8, 2022Updated 4 years ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
George0828Zhang / simulst
View on GitHub
PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.
☆25Oct 3, 2022Updated 3 years ago
voidful / ipa2
View on GitHub
Tools for convert Text to IPA in python
☆19Feb 11, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
microsoft / Interactive-Summarization
View on GitHub
The official repo of our research work "Interactive Editing for Text Summarization".
☆23Jun 3, 2023Updated 3 years ago
voidful / aidev
View on GitHub
Revolutionize your development workflow with AI-powered code assistance, automating mock tests, suggestions, and unit test generation in …
☆33Feb 27, 2025Updated last year
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 2 weeks ago
JSALT-2022-SSL / superb-prosody
View on GitHub
☆31Jul 13, 2023Updated 3 years ago
lstrgar / ss-phoneme-seg
View on GitHub
Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…
☆55Nov 4, 2022Updated 3 years ago
voidful / nlp2
View on GitHub
⚙️Tool for NLP - handle file and text
☆15Feb 16, 2025Updated last year
ga642381 / SpeechPrompt
View on GitHub
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…
☆102Apr 10, 2025Updated last year