bytedance / OmniScient-ModelView external linksLinks
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆99Jul 15, 2024Updated last year
Alternatives and similar repositories for OmniScient-Model
Users that are interested in OmniScient-Model are comparing it to the libraries listed below
Sorting:
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆201Feb 5, 2024Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Jun 9, 2024Updated last year
- (NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection☆123Apr 26, 2024Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆206Jan 8, 2025Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆504Aug 9, 2024Updated last year
- [NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convoluti…☆337Feb 5, 2024Updated 2 years ago
- This repo contains the code for our TMLR paper: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories☆27Mar 20, 2025Updated 10 months ago
- a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.☆81Jul 28, 2023Updated 2 years ago
- Code release for "Language-conditioned Detection Transformer"☆88Jun 17, 2024Updated last year
- ☆201May 19, 2025Updated 8 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆95Jun 22, 2023Updated 2 years ago
- ☆120Jun 11, 2024Updated last year
- ☆32Mar 25, 2024Updated last year
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆18Apr 2, 2025Updated 10 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆96Jul 5, 2024Updated last year
- ☆19Dec 6, 2023Updated 2 years ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆943Aug 5, 2025Updated 6 months ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆551Jun 3, 2025Updated 8 months ago
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆55Aug 27, 2025Updated 5 months ago
- [Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)☆238Aug 3, 2022Updated 3 years ago
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆60Dec 17, 2023Updated 2 years ago
- Recognize Any Regions☆123Dec 18, 2024Updated last year
- ☆60Aug 12, 2024Updated last year
- [CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection☆184Oct 25, 2023Updated 2 years ago
- Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"☆15Dec 20, 2021Updated 4 years ago
- [NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models☆131Dec 3, 2023Updated 2 years ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆607May 8, 2024Updated last year
- ☆28Apr 4, 2025Updated 10 months ago
- PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]☆182May 1, 2025Updated 9 months ago
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want☆866Jul 20, 2025Updated 6 months ago
- ☆16May 26, 2023Updated 2 years ago
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything☆268Apr 11, 2025Updated 10 months ago
- Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024☆47Sep 28, 2024Updated last year
- [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation☆393Sep 19, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆529Apr 8, 2024Updated last year
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆72Jun 3, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆253Feb 11, 2025Updated last year