Code release for "LLMs can see and hear without any training"
☆461May 8, 2025Updated last year
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Simple Scenes Based Movie Generation App☆52Nov 8, 2024Updated last year
- ☆198May 5, 2025Updated last year
- (Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators☆642Jun 1, 2026Updated last month
- [ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆875Jan 28, 2026Updated 5 months ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆308May 21, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆21Feb 14, 2025Updated last year
- A browser-based, WebGL2 implementation of GPT-2 with transform block and attention matrix visualization☆346Oct 24, 2025Updated 8 months ago
- ☆13Jul 10, 2024Updated last year
- Music production for silent film clips.☆32Apr 30, 2025Updated last year
- Fully neural approach for text chunking☆415Oct 23, 2025Updated 8 months ago
- Make your LLM agent and chat with it simple and fast!☆72Nov 22, 2025Updated 7 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,826May 26, 2026Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆359Oct 22, 2024Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆34Feb 11, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A universal RPC layer for AI agents. Connect to any function, any language, any framework, in minutes.☆133Jun 22, 2026Updated last week
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆16Jul 15, 2025Updated 11 months ago
- [IJCV 2026] HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts☆26Feb 28, 2025Updated last year
- Rewriting Principia Mathematica in Lean☆137Feb 5, 2026Updated 5 months ago
- Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit …☆361May 21, 2025Updated last year
- ☆14Jan 22, 2025Updated last year
- Next-Token Prediction is All You Need☆2,423Jan 12, 2026Updated 5 months ago
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆280May 26, 2025Updated last year
- Curated collection of the latest open-source AI agent projects—Weekly updated!☆89Apr 8, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🤖 An open-source AI assistant answering questions using your docs☆255Mar 4, 2026Updated 4 months ago
- Code for "Goal-Guided Neural Cellular Automata: Learning to Control Self-Organising Systems"☆56May 31, 2022Updated 4 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆61Mar 27, 2025Updated last year
- ☆278Mar 6, 2025Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆10,475May 16, 2026Updated last month
- ☆18Sep 22, 2024Updated last year
- [ICML2026] From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors☆92Apr 30, 2026Updated 2 months ago
- Transductive regular expressions☆257Sep 25, 2025Updated 9 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,755Feb 1, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆22Oct 8, 2024Updated last year
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI☆1,375Jan 27, 2026Updated 5 months ago
- 🍃 MINT-1T: A one trillion token multimodal interleaved dataset.☆834Jul 31, 2024Updated last year
- code for "TVG: A Training-free Transition Video Generation Method with Diffusion Models"☆50Aug 19, 2024Updated last year
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,935Mar 3, 2026Updated 4 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,582Jan 12, 2025Updated last year
- Live-bending a foundation model’s output at neural network level.☆273Apr 7, 2025Updated last year