penghao-wu/visual_jigsaw

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/penghao-wu/visual_jigsaw)

penghao-wu / visual_jigsaw

☆78

Alternatives and similar repositories for visual_jigsaw

Users that are interested in visual_jigsaw are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

penghao-wu / ProxyV
View on GitHub
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20May 22, 2025Updated last year
lcqysl / VideoSSR
View on GitHub
[CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"
☆41Nov 11, 2025Updated 8 months ago
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
synvo-ai / local-cocoa
View on GitHub
A local AI assistant running on your device. It turns your files into actionable memory.
☆55Mar 24, 2026Updated 4 months ago
EvolvingLMMs-Lab / EASI
View on GitHub
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
☆118Jul 1, 2026Updated 3 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
FrankYang-17 / RealUnify
View on GitHub
☆27Oct 10, 2025Updated 9 months ago
pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
Visual-AI / SCD
View on GitHub
[CVPRW2024] What’s in a Name? Beyond Class Indices for Image Recognition
☆17Aug 30, 2024Updated last year
EvolvingLMMs-Lab / engram
View on GitHub
Privacy-first AI memory layer - Signal for AI Memory. E2EE, local-first, works with Claude, Cursor, and any MCP-compatible AI.
☆23Jun 12, 2026Updated last month
DripNowhy / Octopus
View on GitHub
[ICML 2026] Official implementation for paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation
☆16Jun 4, 2026Updated last month
minghangz / OnVTG
View on GitHub
Online video temporal grounding
☆16Oct 20, 2025Updated 9 months ago
EvolvingLMMs-Lab / sae
View on GitHub
A framework that allows you to apply Sparse AutoEncoder on any models
☆53Jul 11, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zlab-princeton / vero
View on GitHub
Vero: An Open RL Recipe for General Visual Reasoning
☆135Jun 19, 2026Updated last month
UMass-Embodied-AGI / Mirage
View on GitHub
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
☆294Aug 2, 2025Updated 11 months ago
Vchitect / Uni-MMMU
View on GitHub
[ACL2026 oral] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark
☆25Apr 13, 2026Updated 3 months ago
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 5 months ago
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 4 months ago
penghao-wu / GUI_Reflection
View on GitHub
☆34Sep 19, 2025Updated 10 months ago
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆386Jun 20, 2026Updated last month
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
EvolvingLMMs-Lab / Aero-1
View on GitHub
☆79May 4, 2025Updated last year
Li-Hao-yuan / GeoThinker
View on GitHub
☆69Feb 12, 2026Updated 5 months ago
geshang777 / pix2cap
View on GitHub
Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"
☆28Dec 16, 2025Updated 7 months ago
Visual-AI / JoVA
View on GitHub
JoVA: Unified Multimodal Learning for Joint Video-Audio Generation
☆33Dec 22, 2025Updated 7 months ago
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
Ropedia / S-Agent
View on GitHub
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence
☆78Updated this week
Luodian / GenBench
View on GitHub
Benchmarking and Analyzing Generative Data for Visual Recognition
☆26Jul 25, 2023Updated 3 years ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆727Sep 24, 2025Updated 10 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,152Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
InternLM / ARC-VL
View on GitHub
[CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"
☆46Nov 26, 2025Updated 8 months ago
ULMEvalKit / ULMEvalKit
View on GitHub
ULMEvalKit: One-Stop Eval ToolKit for Image Generation
☆56Dec 17, 2025Updated 7 months ago
GeWu-Lab / Patch-Matters
View on GitHub
[CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
☆25Jun 17, 2025Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
Visual-AI / Dissect-OOD-OSR
View on GitHub
[IJCV 2024] Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks
☆15Aug 30, 2024Updated last year
Visual-AI / Fin3R
View on GitHub
[NeurIPS 2025] Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation
☆64Dec 18, 2025Updated 7 months ago
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago