AutoGeo-Official / AutoGeoLinks
Code for AutoGeo.
☆16Updated last year
Alternatives and similar repositories for AutoGeo
Users that are interested in AutoGeo are comparing it to the libraries listed below
Sorting:
- ☆38Updated 2 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆145Updated last week
- ☆19Updated last year
- Image Tokenizer Needs Post-Training☆24Updated 3 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.☆108Updated 5 months ago
- ☆48Updated last week
- [CVPR 2025] Test-Time Visual In-Context Tuning☆25Updated last week
- Visual Spatial Tuning☆161Updated this week
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆77Updated 2 years ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆37Updated 6 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 10 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆99Updated last year
- ☆64Updated 2 weeks ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆29Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- Sora Generates Videos with Stunning Geometrical Consistency☆51Updated last year
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆62Updated 10 months ago
- ImaginaryNet: Learning Object Detectors without Real Images and Annotations☆26Updated 2 years ago
- ☆20Updated 2 years ago
- (ICLR 2024, CVPR 2024) SparseFormer☆75Updated last year
- ☆180Updated 2 weeks ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 6 months ago
- ☆42Updated 6 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆76Updated 2 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆24Updated last month
- ☆24Updated 6 months ago