PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆180May 1, 2025Updated 11 months ago
Alternatives and similar repositories for nxtp
Users that are interested in nxtp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV 2024] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation☆112Feb 6, 2025Updated last year
- Code of the paper "Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analys…☆27Dec 13, 2023Updated 2 years ago
- ☆35Jan 23, 2024Updated 2 years ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆100Jul 15, 2024Updated last year
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Nov 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- (CVPR2023) CAPE: Camera View Position Embedding for Multi-View 3D Object Detection☆109May 5, 2023Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Jan 3, 2024Updated 2 years ago
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year
- ☆27Aug 28, 2023Updated 2 years ago
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆897Aug 13, 2024Updated last year
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆48Jul 18, 2024Updated last year
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆951Aug 5, 2025Updated 8 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆245Jan 17, 2024Updated 2 years ago
- [Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…☆18Jan 18, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…☆137Mar 20, 2024Updated 2 years ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated 2 years ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,832Nov 27, 2025Updated 4 months ago
- High-performance Image Tokenizers for VAR and AR☆305Apr 25, 2025Updated 11 months ago
- Code release for "Language-conditioned Detection Transformer"☆87Jun 17, 2024Updated last year
- ☆37Jan 20, 2024Updated 2 years ago
- ☆12May 26, 2022Updated 3 years ago
- ☆21Nov 9, 2025Updated 5 months ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Dec 5, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Sep 24, 2023Updated 2 years ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆96Dec 1, 2025Updated 4 months ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆555Jun 3, 2025Updated 10 months ago
- AeDet: Azimuth-invariant Multi-view 3D Object Detection, CVPR2023☆75Jun 17, 2023Updated 2 years ago
- [ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"☆362Jan 14, 2025Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"☆523Jan 27, 2024Updated 2 years ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- DALI Multi Agent System Framework☆42Mar 24, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,249Updated this week
- Structured Video Comprehension of Real-World Shorts☆237Sep 21, 2025Updated 6 months ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆608May 8, 2024Updated last year
- ☆645Feb 15, 2024Updated 2 years ago
- [ECCV 2024] Tokenize Anything via Prompting☆601Dec 11, 2024Updated last year
- ☆38Feb 8, 2024Updated 2 years ago
- [ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model☆433Nov 10, 2024Updated last year