stanfordmlgroup / ManyICLLinks
☆142Updated last year
Alternatives and similar repositories for ManyICL
Users that are interested in ManyICL are comparing it to the libraries listed below
Sorting:
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆230Updated 7 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆137Updated 5 months ago
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆70Updated 4 months ago
- ☆183Updated last year
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆123Updated 2 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆256Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆91Updated last month
- ☆207Updated 4 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated 3 weeks ago
- The All-in-one Judge Models introduced by Opencompass☆93Updated 4 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆119Updated 2 months ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆302Updated last month
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆111Updated 11 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆286Updated 7 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆69Updated 8 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆138Updated 7 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆219Updated last month
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆46Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆143Updated 7 months ago
- ☆292Updated last week
- ☆62Updated last month
- ☆109Updated 3 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆254Updated 3 weeks ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆115Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆86Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆161Updated 3 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆54Updated 7 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆107Updated last month
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]☆218Updated 3 months ago