AutoLab-SAI-SJTU / AutoPageLinks
This is the official implementation for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1.
☆155Updated 3 months ago
Alternatives and similar repositories for AutoPage
Users that are interested in AutoPage are comparing it to the libraries listed below
Sorting:
- ☆507Updated last week
- Official Repository for PosterGen☆211Updated this week
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆275Updated 2 weeks ago
- A Scientific Multimodal Foundation Model☆706Updated this week
- A reproduction of the Deepseek-OCR model including training☆209Updated 2 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆351Updated 8 months ago
- Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"☆558Updated 3 months ago
- ☆83Updated last month
- 🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal rei…☆196Updated last month
- Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.☆535Updated 2 months ago
- The development and future prospects of large multimodal reasoning models.☆582Updated 3 weeks ago
- OpenCUA: Open Foundations for Computer-Use Agents☆672Updated this week
- AgentEvolver: Towards Efficient Self-Evolving Agent System☆1,125Updated last week
- 📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.☆406Updated this week
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆316Updated 2 weeks ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆305Updated last week
- The paper list of "Memory in the Age of AI Agents: A Survey"☆1,078Updated last week
- Fully Open Framework for Democratized Multimodal Training☆710Updated last month
- Step-DeepResearch☆491Updated last week
- [NeurIPS 2025] Thinkless: LLM Learns When to Think☆250Updated 4 months ago
- ☆139Updated 2 weeks ago
- Survey and paper list on efficiency-guided LLM agents (memory, tool learning, planning).☆154Updated last week
- Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…☆577Updated 4 months ago
- Official repository for the paper "Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning" and the SciEvo benchmark.☆48Updated 3 weeks ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆166Updated 10 months ago
- AgentFlow: In-the-Flow Agentic System Optimization☆1,543Updated last week
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆387Updated 5 months ago
- 🔥🔥🔥 ICLR 2025 Oral. Automating Agentic Workflow Generation.☆424Updated last month
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆233Updated 3 months ago
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆395Updated 3 weeks ago