Mountchicken / ResophyLinks

🎯 Read research papers faster with AI. Resophy is an HTML-based AI paper reader with: 🤖 AI Translation & Analysis — instantly understand structure, contributions, and results 🚀 Daily arXiv Recommendations — discover relevant papers with less noise 🛠️ Vibe Coding Oriented — agent-friendly and easy to customize

☆161

Alternatives and similar repositories for Resophy

Users that are interested in Resophy are comparing it to the libraries listed below

Sorting:

Token-family / TokenFD
[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding
☆130Updated 5 months ago
rednote-hilab / dots.vlm1
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
☆284Updated 4 months ago
ding523 / Curr_REFT
☆74Updated 8 months ago
IDEA-Research / Rex-Thinker
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
☆142Updated 7 months ago
linkangheng / PR1
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆285Updated 6 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆130Updated last year
jinbo0906 / Awesome-MLLM-Datasets
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …
☆68Updated 9 months ago
zhangfaen / finetune-Qwen2.5-VL
☆85Updated 5 months ago
justchenhao / ChatDailyPapers
Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…
☆46Updated 2 years ago
PolyU-ChenLab / UniPixel
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆228Updated last month
thunlp / Migician
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆90Updated 8 months ago
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated last year
JIA-Lab-research / VisionReasoner
Vision Manus: Your versatile Visual AI assistant
☆318Updated this week
NiceRingNode / Awesome-Generative-Models-for-OCR
[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR
☆248Updated 5 months ago
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆249Updated 5 months ago
ls-kelvin / REVPT
Code for paper: Reinforced Vision Perception with Tools
☆69Updated 4 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆65Updated last year
Ucas-HaoranWei / Slow-Perception
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
☆159Updated 6 months ago
EvolvingLMMs-Lab / multimodal-search-r1
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆392Updated 5 months ago
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆210Updated 3 months ago
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆545Updated 3 months ago
liangyuwang / zo2
ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory [COLM2025]
☆200Updated 6 months ago
InternLM / CapRL
(ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
☆186Updated this week
deepglint / MVT
Margin-based Vision Transformer
☆64Updated 2 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆273Updated 3 months ago
microsoft / DELT
DELT: Data Efficacy for Language Model Training
☆43Updated 2 weeks ago
kesenzhao / UV-CoT
☆43Updated 6 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆93Updated 2 months ago
libing64 / Qwen2.5-VL-Fine-Tuning
☆34Updated 11 months ago
zhengxuJosh / Awesome-RAG-Vision
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
☆316Updated 2 weeks ago