[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆23Oct 15, 2024Updated last year
Alternatives and similar repositories for ima-lmms
Users that are interested in ima-lmms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An exploration of LLM steering☆26Jun 15, 2024Updated last year
- Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)☆18Oct 18, 2024Updated last year
- ☆84Nov 5, 2024Updated last year
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- [arXiv 2024] FairVision: Equitable Deep Learning for Eye Disease Screening via Fair Identity Scaling☆16Apr 15, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning☆24Jun 25, 2025Updated 11 months ago
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆108Mar 28, 2026Updated 2 months ago
- Applies ROME and MEMIT on Mamba-S4 models☆15Apr 5, 2024Updated 2 years ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆34Oct 16, 2024Updated last year
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆50Dec 19, 2025Updated 5 months ago
- ☆18Mar 20, 2022Updated 4 years ago
- ☆22Apr 22, 2025Updated last year
- Audio-Visual Lip Synthesis via Intermediate Landmark Representation☆19May 16, 2023Updated 3 years ago
- ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple an…☆50Nov 27, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37May 23, 2023Updated 3 years ago
- ☆14Feb 24, 2025Updated last year
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆17Apr 22, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- Agentic Keyframe Search for Video Question Answering☆18Apr 7, 2025Updated last year
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆387Nov 5, 2025Updated 6 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆56Mar 9, 2025Updated last year
- ☆16Apr 8, 2026Updated last month
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆20Jun 2, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks☆74May 7, 2026Updated 3 weeks ago
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆76Jan 27, 2024Updated 2 years ago
- Lightweight control environment for Franka robot☆12Mar 16, 2022Updated 4 years ago
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆24Mar 8, 2026Updated 2 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆35Jun 23, 2025Updated 11 months ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- 七轴机械臂的仿真☆13Jun 7, 2022Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)☆109May 18, 2026Updated last week
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Nov 17, 2023Updated 2 years ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- Evaluate Multimodal LLMs as Embodied Agents☆57Feb 14, 2025Updated last year
- A simple iOS app that records the BlendShapes feature with timestamps provided by ARKit.☆15Nov 9, 2018Updated 7 years ago
- MineRL DDPG Agent to Obtain Diamond in Minecraft☆14Jan 21, 2020Updated 6 years ago
- ☆14Jan 26, 2021Updated 5 years ago