mshukor / ima-lmmsView external linksLinks
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆23Oct 15, 2024Updated last year
Alternatives and similar repositories for ima-lmms
Users that are interested in ima-lmms are comparing it to the libraries listed below
Sorting:
- Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)☆18Oct 18, 2024Updated last year
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆81Nov 17, 2025Updated 2 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆34Oct 16, 2024Updated last year
- SurgLaVi: Large-Scale Hierarchical Datasets for Surgical Vision–Language Representation Learning☆23Feb 2, 2026Updated last week
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Jun 20, 2024Updated last year
- [CVPR 2023] Improving Zero-shot Generalization and Robustness of Multi-modal Models☆35Jul 16, 2023Updated 2 years ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆19Jun 2, 2025Updated 8 months ago
- Frequency tracking in time-frequency representations☆13Jan 19, 2021Updated 5 years ago
- 七轴机械臂的仿真☆13Jun 7, 2022Updated 3 years ago
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- Imshow - Flexible and Customizable Image Display with Python☆13Dec 27, 2025Updated last month
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 2 years ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆12Sep 6, 2024Updated last year
- Synthesize bio-plausible neural networks for cognitive tasks, mimicking brain architecture☆11Apr 14, 2021Updated 4 years ago
- ☆11Sep 27, 2023Updated 2 years ago
- Agentic Keyframe Search for Video Question Answering☆15Apr 7, 2025Updated 10 months ago
- ☆12Oct 17, 2024Updated last year
- Today I Learned 🐢☆10Aug 17, 2023Updated 2 years ago
- Time frequency ridge detection based on relevant ridge portions☆11Aug 17, 2023Updated 2 years ago
- Code for the paper BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues (EMNLP20)☆11Jun 16, 2025Updated 7 months ago
- 本项目提供了面向中文的XLNet预训练模型,旨在丰富中文自然语言处理资源,提供多元化的中文预训练模型选择。 我们欢迎各位专家学者下载使用,并共同促进和发展中文资源建设。☆11May 30, 2023Updated 2 years ago
- Official code repository for Findings of EMNLP 2022 paper: PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base Popula…☆11Oct 18, 2022Updated 3 years ago
- The Gradient Icon package is a powerful Flutter package that enables creating gradient icons effortlessly.☆20May 3, 2024Updated last year
- Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views (with visual imitation learning for robots)☆30Updated this week
- Generate images of Chinese license plates☆11Feb 8, 2021Updated 5 years ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- BFloat16 Fused Adam Operator for PyTorch☆16Nov 16, 2024Updated last year
- ☆26Oct 16, 2025Updated 3 months ago
- ICNet in TensorFlow, Real-Time Segmentation☆10Aug 17, 2018Updated 7 years ago
- ☆13Aug 1, 2024Updated last year
- Ling-Coder-Lite is a MoE LLM provided and open-sourced by CodeFuse and InclusionAI.☆14Apr 22, 2025Updated 9 months ago
- [ACL 2025 Main] (🏆 Outstanding Paper Award) Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Proba…☆15Aug 15, 2025Updated 6 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆48May 24, 2024Updated last year
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Nov 17, 2023Updated 2 years ago
- Official Repository of LatentSeek☆76Jun 6, 2025Updated 8 months ago
- Multimodal RewardBench☆61Feb 21, 2025Updated 11 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆54Mar 9, 2025Updated 11 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆352Nov 5, 2025Updated 3 months ago