[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆55Oct 20, 2022Updated 3 years ago
Alternatives and similar repositories for ViCHA
Users that are interested in ViCHA are comparing it to the libraries listed below
Sorting:
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Official implementation of the paper The Hidden Language of Diffusion Models☆77Jan 24, 2024Updated 2 years ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- PyTorch implementation of Data2Vec self-supervised approach for vision use cases.☆18Oct 7, 2022Updated 3 years ago
- ☆27Aug 1, 2024Updated last year
- ☆87Apr 15, 2022Updated 3 years ago
- ☆10Jan 9, 2025Updated last year
- A large scale dataset for Video Captioning in Italian☆13May 16, 2023Updated 2 years ago
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆10Mar 1, 2025Updated last year
- Segmenting a given document using recursive xy-cut algorithm.☆12Oct 9, 2018Updated 7 years ago
- ☆73Jun 3, 2022Updated 3 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Aug 19, 2021Updated 4 years ago
- GAN(TK)²: GAN Neural Tangent Kernel ToolKit☆13Jul 12, 2022Updated 3 years ago
- Official Code for MIMETIC^2☆13Nov 19, 2024Updated last year
- image perceptual hash experiment using convolutional neural net models☆12Sep 22, 2021Updated 4 years ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆85Nov 2, 2022Updated 3 years ago
- Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20☆29May 26, 2022Updated 3 years ago
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆43Nov 5, 2025Updated 3 months ago
- [NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer☆28Oct 2, 2025Updated 5 months ago
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12May 5, 2022Updated 3 years ago
- WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching☆16Dec 10, 2021Updated 4 years ago
- [WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning☆18May 7, 2025Updated 9 months ago
- ☆37Oct 7, 2023Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- ☆13Apr 7, 2022Updated 3 years ago
- Vision-Language Pretraining & Efficient Transformer Papers.☆15Nov 30, 2021Updated 4 years ago
- survery of small language models☆18Jul 23, 2024Updated last year
- A framework for index based similarity search.☆20May 10, 2019Updated 6 years ago
- BISON: Binary Image SelectiON☆49Sep 15, 2021Updated 4 years ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆40Jul 29, 2023Updated 2 years ago
- ☆17Sep 16, 2018Updated 7 years ago
- [CVPR2025] Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation☆18May 2, 2025Updated 10 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆16May 21, 2024Updated last year
- PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models☆34Jan 14, 2026Updated last month
- ☆16May 26, 2023Updated 2 years ago
- ☆16May 19, 2022Updated 3 years ago
- ☆16Apr 4, 2025Updated 10 months ago
- ☆33Jun 4, 2022Updated 3 years ago