SalesforceAIResearch/LATTE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SalesforceAIResearch/LATTE)

SalesforceAIResearch / LATTE

☆70

Alternatives and similar repositories for LATTE

Users that are interested in LATTE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JieyuZ2 / ProVision
View on GitHub
A instruction data generation system for multimodal language models.
☆37Jan 31, 2025Updated last year
hkust-nlp / mstar
View on GitHub
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆75Jul 13, 2025Updated last year
Linzwcs / AutoMusicTheoryQA
View on GitHub
☆22Nov 21, 2025Updated 8 months ago
STARE-bench / STARE
View on GitHub
☆19Oct 12, 2025Updated 9 months ago
JieyuZ2 / TaskMeAnything
View on GitHub
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆71Nov 27, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
RAIVNLab / sugar-crepe
View on GitHub
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆93Feb 13, 2024Updated 2 years ago
zeyofu / ReFocus_Code
View on GitHub
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆50Jul 22, 2025Updated last year
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 3 years ago
hychaochao / Chat-Models-Backdoor-Attacking
View on GitHub
Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"
☆19Apr 13, 2024Updated 2 years ago
ssmisya / PRMBench
View on GitHub
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆93Feb 15, 2025Updated last year
hkust-nlp / Laser
View on GitHub
[ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆66May 22, 2025Updated last year
TAU-VAILab / hierarcaps
View on GitHub
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆34Aug 12, 2024Updated last year
zhaochen0110 / Timo
View on GitHub
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
☆26Oct 23, 2024Updated last year
ssmisya / AdaReasoner
View on GitHub
[ICLR 2026] The official repository for the paper "AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning".
☆83Feb 27, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mbzuai-oryx / Agent-X
View on GitHub
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆43Apr 28, 2026Updated 2 months ago
Dongping-Chen / ISG
View on GitHub
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Aug 7, 2025Updated 11 months ago
ssmisya / VLMLT
View on GitHub
[CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Cal…
☆22Jun 6, 2025Updated last year
zhaochen0110 / LMLM
View on GitHub
Code and data for "Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change" (EMNLP2022)
☆17Dec 8, 2022Updated 3 years ago
VincentDENGP / 3D-LR
View on GitHub
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Mar 28, 2024Updated 2 years ago
yu-rp / VisualPerceptionToken
View on GitHub
☆136Mar 22, 2025Updated last year
zzzhr97 / SpecBench
View on GitHub
Code repository for the ICML 2026 paper "Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation".
☆24Jun 14, 2026Updated last month
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenSparseLLMs / Linearization
View on GitHub
☆71Jul 8, 2025Updated last year
mlvlab / ST-VLM
View on GitHub
☆13Mar 28, 2025Updated last year
lcqysl / VideoSSR
View on GitHub
[CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"
☆41Nov 11, 2025Updated 8 months ago
OpenSparseLLMs / Skip-DiT
View on GitHub
✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
☆80Jul 10, 2025Updated last year
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆155May 25, 2026Updated 2 months ago
facebookresearch / SIEVE
View on GitHub
SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)
☆21Apr 28, 2024Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
ys-zong / MIRB
View on GitHub
Benchmarking Multi-Image Understanding in Vision and Language Models
☆11Jul 29, 2024Updated last year
FYYDCC / IVT-LR
View on GitHub
Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”
☆18Jan 27, 2026Updated 5 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
qizekun / OmniSpatial
View on GitHub
[ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆88Jan 21, 2026Updated 6 months ago
YYJMJC / LOUPE
View on GitHub
☆45Aug 14, 2023Updated 2 years ago
pointarena / pointarena
View on GitHub
☆37Aug 25, 2025Updated 11 months ago
InternRobotics / MMSI-Video-Bench
View on GitHub
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
☆60Mar 11, 2026Updated 4 months ago
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
zhangzef / COOPER
View on GitHub
The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.
☆38Jul 1, 2026Updated 3 weeks ago