xlyu0106/VisMem

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xlyu0106/VisMem)

xlyu0106 / VisMem

☆91

Alternatives and similar repositories for VisMem

Users that are interested in VisMem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xlyu0106 / MACT
View on GitHub
☆19Jul 31, 2025Updated 11 months ago
lyrig / TokenAR
View on GitHub
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
☆22Mar 4, 2026Updated 4 months ago
Yuan-Hou / Human-MME
View on GitHub
Official repository for "Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models"
☆22Dec 2, 2025Updated 7 months ago
LsmnBmnc / Med-CMR
View on GitHub
Official code repository for Med-CMR : "A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multi…
☆26Dec 10, 2025Updated 7 months ago
xlyu0106 / ViF
View on GitHub
[ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
☆44Oct 3, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
latentcraft / replay
View on GitHub
[CVPR 2026] Boosting Reasoning in Large Multimodal Models via Activation Replay
☆24May 7, 2026Updated 2 months ago
YinBo0927 / FeRA
View on GitHub
[ICML 2026] The official code of FeRA: Frequency–Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning
☆29Dec 27, 2025Updated 6 months ago
HUuxiaobin / VTBench
View on GitHub
☆22May 26, 2025Updated last year
rain152 / LFA-Video-Generation
View on GitHub
From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts
☆27Jan 12, 2026Updated 6 months ago
HaoxuanXU1024 / IRPO
View on GitHub
☆30Nov 28, 2025Updated 7 months ago
NUS-Project / Landmark-of-medical-agent
View on GitHub
☆181Jun 8, 2026Updated last month
bingreeky / MemGen
View on GitHub
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
☆406Jun 10, 2026Updated last month
zhangzjn / T3-Video
View on GitHub
[ICML 2026] Transform Trained Transformer for Accelerating Native 4K Video Generation
☆41Dec 16, 2025Updated 7 months ago
NOVAglow646 / Monet
View on GitHub
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
☆207Mar 19, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xlyu0106 / Awesome-Latent-Space
View on GitHub
A paper list of Awesome Latent Space.
☆946Jul 13, 2026Updated last week
bingreeky / opd-evolver
View on GitHub
☆37Jun 17, 2026Updated last month
NUS-Project / MedMASLab
View on GitHub
☆30Mar 22, 2026Updated 4 months ago
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆84May 12, 2026Updated 2 months ago
UMass-Embodied-AGI / Mirage
View on GitHub
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
☆293Aug 2, 2025Updated 11 months ago
Svardfox / LaViT
View on GitHub
Official codebase for the paper LaViT
☆34Feb 15, 2026Updated 5 months ago
zhangzjn / Soul
View on GitHub
[CVPR 2026] Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
☆64Dec 16, 2025Updated 7 months ago
VincentLeebang / lvr
View on GitHub
Official codebase for the paper Latent Visual Reasoning
☆170Oct 22, 2025Updated 8 months ago
ybb6 / laser
View on GitHub
☆34Apr 22, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Yang011013 / Awesome-Streaming-Video-Understanding
View on GitHub
Awesome latest models, datasets and benchmarks on streaming/online video understanding.
☆31Oct 19, 2025Updated 9 months ago
liushulinle / MarsRL
View on GitHub
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
☆18Nov 18, 2025Updated 8 months ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆197Updated this week
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 2 months ago
zhangquanchen / 4DThinker
View on GitHub
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
☆77May 26, 2026Updated last month
silent-commit / CLEAR
View on GitHub
CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal
☆20May 25, 2026Updated last month
LzVv123456 / VISTA
View on GitHub
☆86Jul 28, 2025Updated 11 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
vivo / DiMo-GUI
View on GitHub
[EMNLP 2025]Repository for paper "DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning"
☆30Jul 2, 2025Updated last year
jins7 / LatentEvolve
View on GitHub
☆27Oct 9, 2025Updated 9 months ago
zhangquanchen / SIFThinker
View on GitHub
[AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
☆22Dec 2, 2025Updated 7 months ago
zoezheng126 / Spatio-Temporal-LLM
View on GitHub
☆19Aug 7, 2025Updated 11 months ago
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago