stanfordmlgroup / ManyICL
☆140Updated 11 months ago
Alternatives and similar repositories for ManyICL
Users that are interested in ManyICL are comparing it to the libraries listed below
Sorting:
- [ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.☆67Updated 2 months ago
- ☆177Updated last year
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆206Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆83Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆111Updated 2 weeks ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆104Updated 2 weeks ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆128Updated 3 months ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆219Updated last week
- E5-V: Universal Embeddings with Multimodal Large Language Models☆248Updated 4 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆53Updated 6 months ago
- ☆97Updated 2 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆104Updated 2 weeks ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆409Updated last year
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated 2 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆152Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆188Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- ☆93Updated 3 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆69Updated 6 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆133Updated 5 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆198Updated this week
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆120Updated 8 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆116Updated 5 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆109Updated 9 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆281Updated 5 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆190Updated last month
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆114Updated 9 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 5 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆70Updated last month