Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.
☆18Feb 20, 2024Updated 2 years ago
Alternatives and similar repositories for multimodal-LLM
Users that are interested in multimodal-LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆44Sep 15, 2025Updated 7 months ago
- Repository contains code to fine-tune WhisperASR model☆23Dec 16, 2022Updated 3 years ago
- Content related to Microsoft AI build events☆13Apr 29, 2024Updated last year
- Audio-Visual Speech Recognition☆21Jul 7, 2025Updated 9 months ago
- ☆27May 30, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code for paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI"☆13Jan 19, 2024Updated 2 years ago
- A minimal re-implementation of orthogonal fine-tuning (OFT), a diffusion method, for LLMs. Based on nanoGPT and minLoRA.☆14Nov 17, 2023Updated 2 years ago
- Scripts for KGIRNet model for ESWC☆10Jul 6, 2023Updated 2 years ago
- Tacotron 2 training notebook supporting Japanese, French, and Mandarin☆11Nov 19, 2022Updated 3 years ago
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆47Jun 12, 2025Updated 10 months ago
- replacement of AdamW and Lion optimizer for LLMs☆13May 28, 2023Updated 2 years ago
- fast opus bindings for node and browsers☆15Feb 11, 2024Updated 2 years ago
- [ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games☆15Feb 8, 2026Updated 2 months ago
- This project is from the Airbnb Recruitment Challenge on Kaggle. The challenge is to solve a multi-class classification problem of predic…☆11Feb 22, 2022Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Collection of ASR models for English TFLite models for faster inference.☆14Feb 21, 2022Updated 4 years ago
- Sarjana is an open source desktop application which is used to assist in reading information materials, be it research papers or technica…☆24Jul 22, 2024Updated last year
- Official implementation for "Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts"☆22Jun 28, 2025Updated 9 months ago
- 😜Constrative Learning of Sentence Embedding using LoRA (EECS487 final project)☆13Apr 19, 2023Updated 2 years ago
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- [ICON 2020] TensorFlow Code for "End-to-End Automatic Speech Recognition System for Gujarati"☆13Jul 26, 2021Updated 4 years ago
- ☆20Aug 28, 2024Updated last year
- Algorithms for Policy Evaluation, Estimation of Action Values, Policy Improvement, Policy Iteration, Truncated Policy Evaluation, Truncat…☆11Apr 3, 2019Updated 7 years ago
- ☆16Nov 24, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Ollama with RAG and Chainlit is a chatbot project leveraging Ollama, RAG, and Chainlit. It uses Chromadb for vector storage, gpt4all for …☆14Feb 15, 2024Updated 2 years ago
- An extension of VirtualHome for generating and augmenting knowledge graphs☆15Oct 24, 2024Updated last year
- [Computer Speech & Language] A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages☆14Aug 9, 2024Updated last year
- Syntexmex plugin for blender☆16Mar 28, 2020Updated 6 years ago
- CHiME-9 Task 1 - MCoRec baseline☆27Jan 13, 2026Updated 3 months ago
- Flocon - 無料で自鯖に設置できる、新世代の多機能なTRPGオンラインセッションツールです。☆16Updated this week
- Examples to finetune encoder-only and encoder-decoder transformers for Japanese language in Hugging Face (Oct 2022)☆16Oct 6, 2023Updated 2 years ago
- An SVM model for multi-class classification of Thyroid data.☆11Dec 9, 2019Updated 6 years ago
- Fork of RecurrentGPT with modifications☆10Sep 18, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NeurIPS 2024] PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications☆20Nov 4, 2024Updated last year
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- LightRAG with Neo4j Example Project☆17May 19, 2025Updated 10 months ago
- Finetuning Whisper ASR model for Belarusian language☆17Feb 16, 2025Updated last year
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆48Jul 22, 2025Updated 8 months ago
- A Workshop on EEG with Drone Control Demos☆18Dec 16, 2024Updated last year
- ☆12Jan 27, 2025Updated last year