stanfordmlgroup / ManyICL
☆135Updated 8 months ago
Alternatives and similar repositories for ManyICL:
Users that are interested in ManyICL are comparing it to the libraries listed below
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆89Updated last month
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆161Updated 3 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆104Updated 3 weeks ago
- ☆166Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆87Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆127Updated 3 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆190Updated last month
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆100Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 8 months ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆391Updated last month
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆96Updated last month
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆89Updated 2 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆79Updated 9 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆47Updated this week
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆168Updated this week
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆63Updated this week
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆118Updated 6 months ago
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆111Updated last year
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆86Updated 4 months ago
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆53Updated last week
- Rethinking Step-by-step Visual Reasoning in LLMs☆247Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆144Updated 4 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆131Updated 2 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]☆199Updated this week
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆141Updated 8 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆64Updated 2 months ago
- ☆79Updated 2 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆98Updated 4 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆141Updated last month
- InstructionGPT-4☆39Updated last year