SalesforceAIResearch / TACOLinks

☆59

Alternatives and similar repositories for TACO

Users that are interested in TACO are comparing it to the libraries listed below

Sorting:

yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆88Updated 2 weeks ago
EvolvingLMMs-Lab / multimodal-search-r1
☆102Updated last month
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆63Updated 10 months ago
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆71Updated 6 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆103Updated 2 weeks ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL
☆92Updated this week
shulin16 / MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
☆43Updated 3 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆51Updated last week
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆57Updated 7 months ago
Yushi-Hu / VisualSketchpad
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
☆222Updated 7 months ago
open-compass / ProSA
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆25Updated 2 weeks ago
EvolvingLMMs-Lab / VideoMMMU
☆46Updated last month
showlab / VideoGUI
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
☆35Updated last month
THU-KEG / LongWriter-V
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆17Updated 2 months ago
NUS-TRAIL / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆64Updated 2 weeks ago
xufangzhi / Genius
[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework
☆60Updated this week
kokolerk / TON
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆36Updated 2 weeks ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆67Updated last year
EvolvingLMMs-Lab / multimodal-sae
Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆132Updated 4 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆37Updated 5 months ago
shawnricecake / Heima
Code for Heima
☆43Updated last month
hewei2001 / ReachQA
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
☆53Updated 7 months ago
DAMO-NLP-SG / multimodal_textbook
The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆158Updated 2 months ago
MingLiiii / Layer_Gradient
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆64Updated 3 months ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
☆66Updated last month
TIGER-AI-Lab / QuickVideo
Quick Long Video Understanding
☆38Updated last week
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆22Updated 4 months ago
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
☆32Updated last month
xlang-ai / OSWorld-G
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆49Updated last week
RifleZhang / LLaVA-Reasoner-DPO
☆77Updated 4 months ago