Yutong-Zhou-cv / Awesome-Survey-Papers
A curated list of Survey Papers on Deep Learning.
☆10Updated last year
Alternatives and similar repositories for Awesome-Survey-Papers:
Users that are interested in Awesome-Survey-Papers are comparing it to the libraries listed below
- Masked Vision-Language Transformer in Fashion☆33Updated last year
- Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"☆18Updated last year
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆52Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆25Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆53Updated 2 years ago
- A curated list of papers and resources for text-to-image evaluation.☆28Updated last year
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆56Updated last year
- ☆52Updated 2 years ago
- ☆57Updated 11 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated last year
- ☆25Updated last year
- ☆23Updated 5 months ago
- A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or v…☆36Updated last year
- ☆59Updated last year
- ☆59Updated last year
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆29Updated 4 months ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆40Updated last month
- ☆33Updated last year
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆58Updated last year
- ☆30Updated 2 years ago
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated 2 years ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆77Updated 10 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated 2 months ago
- ☆19Updated 11 months ago
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- ☆14Updated 10 months ago
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆35Updated 2 years ago
- ☆24Updated last year