InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
☆105Apr 20, 2026Updated last week
Alternatives and similar repositories for InfiniteVL
Users that are interested in InfiniteVL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The first decoder-only multimodal state space model☆104May 19, 2025Updated 11 months ago
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆76Jun 26, 2025Updated 10 months ago
- ☆61May 13, 2025Updated 11 months ago
- ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…☆63Mar 3, 2026Updated last month
- The official repository of the first version of ACE-Brain foundation model.☆75Mar 13, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆48Dec 12, 2025Updated 4 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆88Dec 24, 2025Updated 4 months ago
- The training codes of Jasper-Token-Compression-600M☆19Nov 19, 2025Updated 5 months ago
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆46Mar 25, 2025Updated last year
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- Official code of "ViTGaze: Gaze Following with Interaction Features in Vision Transformers"☆63Mar 3, 2025Updated last year
- Official implementation of T-PAMI25 paper "M²Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes"☆114Jun 17, 2025Updated 10 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆50Jan 8, 2025Updated last year
- ☆36Jun 3, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2025] RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning☆235Apr 17, 2026Updated 2 weeks ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- ☆32Dec 31, 2025Updated 4 months ago
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆12Nov 13, 2024Updated last year
- [CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding☆212Jan 5, 2026Updated 3 months ago
- ☆38Dec 16, 2025Updated 4 months ago
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning☆29Jan 14, 2026Updated 3 months ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆130Oct 23, 2025Updated 6 months ago
- [NeurIPS 2023] CircuitFormer: Circuit as Set of Points☆38Nov 22, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆17Dec 13, 2023Updated 2 years ago
- [ACMMM 2025] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆22Jun 20, 2025Updated 10 months ago
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models☆123Apr 25, 2025Updated last year
- ☆14Aug 1, 2025Updated 9 months ago
- ☆143Feb 13, 2026Updated 2 months ago
- Revisiting End-to-End Speech-to-Text Translation From Scratch☆13Feb 21, 2023Updated 3 years ago
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆74Apr 20, 2026Updated last week
- Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."☆70Feb 25, 2026Updated 2 months ago
- ☆12Feb 13, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 6 months ago
- [ICCV 2025] Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction☆53Sep 22, 2025Updated 7 months ago
- Visual Instruction Tuning for Qwen2 Base Model☆43Jun 29, 2024Updated last year
- Official implementation of Log-linear Sparse Attention (LLSA).☆70Feb 2, 2026Updated 3 months ago
- Rethinking the Trust Region in LLM Reinforcement Learning☆52Mar 2, 2026Updated 2 months ago
- ☆21Dec 3, 2025Updated 4 months ago
- ☆19Oct 22, 2023Updated 2 years ago