bytedance/OmniScient-Model

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/OmniScient-Model)

bytedance / OmniScient-Model

This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model

☆102

Alternatives and similar repositories for OmniScient-Model

Users that are interested in OmniScient-Model are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TACJu / Axial-VS
View on GitHub
This repo contains the code for our TMLR paper: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
☆27Mar 20, 2025Updated last year
Beckschen / ViTamin
View on GitHub
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆211Jun 9, 2024Updated 2 years ago
bytedance / kmax-deeplab
View on GitHub
a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.
☆80Jul 28, 2023Updated 2 years ago
prannaykaul / mm-ovod
View on GitHub
Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"
☆95Jun 22, 2023Updated 3 years ago
bytedance / fc-clip
View on GitHub
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convoluti…
☆345Feb 5, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wusize / CLIPSelf
View on GitHub
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆207Feb 5, 2024Updated 2 years ago
bytedance / coconut_cvpr2024
View on GitHub
☆206Apr 21, 2026Updated 3 months ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
Ali2500 / ViCaS
View on GitHub
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)
☆21Apr 2, 2025Updated last year
CVMI-Lab / CoDet
View on GitHub
(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
☆123Apr 26, 2024Updated 2 years ago
V3Det / V3Det
View on GitHub
☆121Jun 11, 2024Updated 2 years ago
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆15Jul 11, 2024Updated 2 years ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year
janghyuncho / DECOLA
View on GitHub
Code release for "Language-conditioned Detection Transformer"
☆86Jun 17, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
dlsrbgg33 / Video-3DGS
View on GitHub
☆28Apr 4, 2025Updated last year
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
jiaosiyu1999 / MAFT
View on GitHub
☆60Aug 12, 2024Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
facebookresearch / PartDistillation
View on GitHub
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆60Dec 17, 2023Updated 2 years ago
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
haochenheheda / LVVIS
View on GitHub
Large-Vocabulary Video Instance Segmentation dataset
☆99Jul 5, 2024Updated 2 years ago
RangiLyu / llama.mmengine
View on GitHub
Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!
☆40Apr 2, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LiheYoung / FreeMask
View on GitHub
[NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
☆133Dec 3, 2023Updated 2 years ago
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
openseg-group / RankSeg
View on GitHub
[ECCV2022] This is an official implementation of paper "RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentati…
☆78Feb 12, 2023Updated 3 years ago
lambert-x / ProLab
View on GitHub
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆55Aug 27, 2025Updated 10 months ago
lslrh / DMA
View on GitHub
Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024
☆32Jul 18, 2024Updated 2 years ago
qihao067 / DiMR
View on GitHub
[NeurIPS 24] Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
☆44Sep 30, 2024Updated last year
OliverRensu / FreqFlow
View on GitHub
The official implementation of "Frequency-Aware Flow Matching for High-Quality Image Generation"
☆29Apr 20, 2026Updated 3 months ago
facebookresearch / VLPart
View on GitHub
[ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation
☆395Sep 19, 2023Updated 2 years ago
wusize / ovdet
View on GitHub
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
☆187Oct 25, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yuhangzang / OV-DETR
View on GitHub
[Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)
☆240Aug 3, 2022Updated 3 years ago
Surrey-UP-Lab / RegionSpot
View on GitHub
Recognize Any Regions
☆123Dec 18, 2024Updated last year
shenyunhang / APE
View on GitHub
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆608May 8, 2024Updated 2 years ago
mc-lan / ProxyCLIP
View on GitHub
[ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
☆120Mar 26, 2025Updated last year
kingsj0405 / DIFE
View on GitHub
Dense Interspecies Face Embedding (NeurIPS 2022)
☆25May 16, 2023Updated 3 years ago
kaiyuyue / nxtp
View on GitHub
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆180May 1, 2025Updated last year
aim-uofa / SegPrompt
View on GitHub
Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning
☆112May 28, 2025Updated last year