ByungKwanLee / TroL
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagation operation to get super vision language performances.
☆94Updated 9 months ago
Alternatives and similar repositories for TroL:
Users that are interested in TroL are comparing it to the libraries listed below
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆110Updated 9 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆52Updated 5 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆319Updated 11 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 9 months ago
- Matryoshka Multimodal Models☆98Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆154Updated 2 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆234Updated 3 months ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆66Updated 6 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆130Updated 4 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆199Updated 2 months ago
- ☆111Updated 7 months ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆33Updated 7 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 2 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆157Updated 5 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆149Updated 6 months ago
- [ACL 2024 Findings] Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mO…☆95Updated 8 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]☆208Updated this week
- Official implementation of the Law of Vision Representation in MLLMs☆151Updated 4 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆120Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆57Updated last month
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆234Updated 7 months ago
- a family of highly capabale yet efficient large multimodal models☆178Updated 7 months ago
- ☆73Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆53Updated 3 months ago
- ☆133Updated last year
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆96Updated last month
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆205Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆73Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated last month