YWenxi/think-with-images-through-self-calling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YWenxi/think-with-images-through-self-calling)

YWenxi / think-with-images-through-self-calling

official repo for `thinking with images through-self-calling`

☆26

Alternatives and similar repositories for think-with-images-through-self-calling

Users that are interested in think-with-images-through-self-calling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mingrui-wu / OSI-Bench
View on GitHub
Official repo of From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
☆24Jun 23, 2026Updated last month
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 6 months ago
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
MzeroMiko / XDLM
View on GitHub
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
☆27Mar 16, 2026Updated 4 months ago
AkitsukiM / VMamba-DOTA
View on GitHub
☆31Sep 24, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AZZMM / CC-Diff
View on GitHub
Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"
☆28Dec 19, 2025Updated 7 months ago
GAIR-NLP / self-improvement-reversal
View on GitHub
☆13Jul 14, 2024Updated 2 years ago
martian422 / MaskGRPO
View on GitHub
The official implementation of MaskGRPO: Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models. (ICLR 2026, arxiv…
☆19Jan 27, 2026Updated 6 months ago
callsys / DynRefer
View on GitHub
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
☆59Mar 4, 2025Updated last year
Mosi-AI / M2RL
View on GitHub
☆16May 15, 2026Updated 2 months ago
ByteDance-BandAI / CodeVision
View on GitHub
[CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images
☆71Jan 23, 2026Updated 6 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,253Nov 20, 2025Updated 8 months ago
lmsdss / MetaModulation
View on GitHub
MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks (ICML 2023)
☆11Aug 15, 2023Updated 2 years ago
SPORT-Agents / SPORT-Agents
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
MzeroMiko / vHeat
View on GitHub
vHeat: Building Vision Models upon Heat Conduction
☆284Jun 12, 2025Updated last year
ltpo2025 / LTPO
View on GitHub
[ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
☆32Mar 6, 2026Updated 4 months ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
AllanYangZhou / generative-invariance-transfer
View on GitHub
☆26Feb 27, 2022Updated 4 years ago
seanzhuh / Awesome-Open-Vocabulary-Detection-and-Segmentation
View on GitHub
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
☆219Apr 3, 2025Updated last year
LaVi-Lab / Rethink_CoT_Video
View on GitHub
Official code for "Rethinking Chain-of-Thought Reasoning for Videos"
☆21Dec 14, 2025Updated 7 months ago
sunsmarterjie / iTPN
View on GitHub
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
☆216Jul 28, 2024Updated 2 years ago
thu-ml / MLA-Trust
View on GitHub
A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions thro…
☆63Jan 9, 2026Updated 6 months ago
callsys / GenPromp
View on GitHub
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
☆57Nov 10, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
sunsmarterjie / ChatterBox
View on GitHub
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆61May 2, 2025Updated last year
tpoisonooo / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆11Mar 24, 2025Updated last year
showlab / Tune-An-Ellipse
View on GitHub
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
☆14Jan 5, 2025Updated last year
matsuolab / multibanana
View on GitHub
[CVPR 2026 Main] MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
☆29Jul 6, 2026Updated 3 weeks ago
EnnengYang / RepresentationSurgery
View on GitHub
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆49Oct 10, 2024Updated last year
KangarooGroup / Kangaroo
View on GitHub
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
☆67Aug 30, 2024Updated last year
liuanji / CoDD
View on GitHub
Official implementation of "Breaking the Factorization Barrier in Diffusion Language Models"
☆17Mar 27, 2026Updated 4 months ago
roymiles / Simple-Recipe-Distillation
View on GitHub
[AAAI 2024] Understanding the Role of the Projector in Knowledge Distillation
☆20Feb 13, 2024Updated 2 years ago
showlab / DIM
View on GitHub
[ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
☆28May 11, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
baopj / E3M
View on GitHub
[ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.
☆11Jul 16, 2024Updated 2 years ago
LiBingyu01 / U3M
View on GitHub
[Pattern Recognition 2025 🌟]Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
☆10Jun 12, 2024Updated 2 years ago
PeterYYZhang / LayerCraft
View on GitHub
Official Repo for LayerCraft
☆18May 3, 2026Updated 2 months ago
ls-kelvin / REVPT
View on GitHub
Code for paper: Reinforced Vision Perception with Tools
☆74Oct 3, 2025Updated 9 months ago
ischlag / Fast-Weight-Memory-public
View on GitHub
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.
☆30Feb 25, 2021Updated 5 years ago
kaistAI / How-Well-Do-LLMs-Truly-Ground
View on GitHub
☆11Sep 19, 2025Updated 10 months ago
NobuoTsukamoto / jax_examples
View on GitHub
Jax, Flax, examples (ImageClassification, SemanticSegmentation, and more...)
☆10May 10, 2025Updated last year