yuzhms / Streaming-Video-Model
[CVPR2023] Code for "Streaming Video Model"
☆78Updated last year
Alternatives and similar repositories for Streaming-Video-Model:
Users that are interested in Streaming-Video-Model are comparing it to the libraries listed below
- ☆58Updated last year
- ☆52Updated last year
- ☆48Updated 8 months ago
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆49Updated last month
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆121Updated 6 months ago
- ☆104Updated 8 months ago
- Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"☆35Updated last year
- [T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection☆35Updated last year
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆74Updated 7 months ago
- ☆47Updated 2 years ago
- Large-Vocabulary Video Instance Segmentation dataset☆80Updated 7 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- Recognize Any Regions☆122Updated 2 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆97Updated 9 months ago
- ☆171Updated 2 years ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 10 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆71Updated 3 months ago
- ☆28Updated last year
- ☆58Updated last year
- ☆106Updated 7 months ago
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆88Updated last year
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆93Updated 7 months ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated last year
- [ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"☆49Updated 4 months ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆122Updated 6 months ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆41Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 8 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆67Updated 4 months ago