kaiyuyue / nxtpView external linksLinks
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆182May 1, 2025Updated 9 months ago
Alternatives and similar repositories for nxtp
Users that are interested in nxtp are comparing it to the libraries listed below
Sorting:
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆33Nov 29, 2023Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Jan 3, 2024Updated 2 years ago
- Code of the paper "Efficient Object Detection in Autonomous Driving using Spiking Neural Networks: Performance, Energy Consumption Analys…☆27Dec 13, 2023Updated 2 years ago
- [Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…☆18Jan 18, 2025Updated last year
- [ECCV 2024] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation☆112Feb 6, 2025Updated last year
- ☆34Jan 23, 2024Updated 2 years ago
- (CVPR2023) CAPE: Camera View Position Embedding for Multi-View 3D Object Detection☆110May 5, 2023Updated 2 years ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆99Jul 15, 2024Updated last year
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆247Jan 17, 2024Updated 2 years ago
- 2D road segmentation using lidar data during training☆43Dec 21, 2023Updated 2 years ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Sep 24, 2023Updated 2 years ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆945Aug 5, 2025Updated 6 months ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Dec 5, 2023Updated 2 years ago
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆889Aug 13, 2024Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆48Jul 18, 2024Updated last year
- ☆21Nov 9, 2025Updated 3 months ago
- A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…☆137Mar 20, 2024Updated last year
- ☆37Jan 20, 2024Updated 2 years ago
- DALI Multi Agent System Framework☆42Jan 30, 2026Updated 2 weeks ago
- ☆38Feb 8, 2024Updated 2 years ago
- [ECCV 2024] Tokenize Anything via Prompting☆603Dec 11, 2024Updated last year
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.☆92Jul 21, 2024Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆206Jan 8, 2025Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,812Nov 27, 2025Updated 2 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆129Aug 21, 2024Updated last year
- Code release for "Language-conditioned Detection Transformer"☆88Jun 17, 2024Updated last year
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆60May 2, 2025Updated 9 months ago
- ☆401Dec 12, 2024Updated last year
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆93Dec 1, 2025Updated 2 months ago
- ☆17Oct 18, 2022Updated 3 years ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Mar 20, 2024Updated last year
- ☆643Feb 15, 2024Updated last year
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆607May 8, 2024Updated last year
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆246Aug 14, 2024Updated last year
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,159Updated this week
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,397Aug 4, 2025Updated 6 months ago
- Structured Video Comprehension of Real-World Shorts☆230Sep 21, 2025Updated 4 months ago