☆19Mar 24, 2025Updated 11 months ago
Alternatives and similar repositories for vision-datasets
Users that are interested in vision-datasets are comparing it to the libraries listed below
Sorting:
- AML Command Transfer. A lightweight tool to transfer any command line to Azure Machine Learning Services☆20May 23, 2024Updated last year
- State-of-the-art pretrained vision model from Bing Multimedia☆19Oct 2, 2023Updated 2 years ago
- ☆11Jul 31, 2022Updated 3 years ago
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆72May 22, 2023Updated 2 years ago
- YFCC100M Downloader☆24May 14, 2018Updated 7 years ago
- Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone☆131Oct 10, 2023Updated 2 years ago
- Codes for our ACM MM 2019 paper: "Exploiting Temporal Relationships in Video Moment Localization with Natural Language"☆16Oct 22, 2022Updated 3 years ago
- Source code for "Importance-based Neuron Allocation for Multilingual Neural Machine Translation"☆12Sep 15, 2021Updated 4 years ago
- Pytorch implementation for our NeurIPS 2019 paper "TAB-VCR: Tags and Attributes based VCR Baselines" https://arxiv.org/abs/1910.14671☆19May 6, 2021Updated 4 years ago
- Code for IterInpaint model, presented in Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (CVPR 2024 work…☆25Jul 21, 2024Updated last year
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- ☆14Jun 16, 2023Updated 2 years ago
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆13Sep 30, 2023Updated 2 years ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Code for ACL 2019 paper "Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs"☆18Feb 9, 2020Updated 6 years ago
- Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)☆36May 29, 2024Updated last year
- ☆44Mar 12, 2026Updated last week
- Researchers who published code, models (in some cases), and demo apps (in few cases) along with their SOTA paper☆12Oct 19, 2023Updated 2 years ago
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Jun 7, 2023Updated 2 years ago
- Official code of *Towards Event-oriented Long Video Understanding*☆12Jul 26, 2024Updated last year
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆47Sep 25, 2023Updated 2 years ago
- Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"☆22Nov 21, 2023Updated 2 years ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 8 months ago
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆17Jun 3, 2025Updated 9 months ago
- This is the official repo for "MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment"☆17May 27, 2019Updated 6 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆12Mar 14, 2025Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- ☆11Oct 2, 2024Updated last year
- An example integration between Flask and the Preact front end library.☆13Jun 20, 2022Updated 3 years ago
- Ranking-Consistent Language-Image Pretraining☆12Oct 24, 2025Updated 4 months ago
- AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)☆60Mar 1, 2025Updated last year
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech☆11Jun 30, 2023Updated 2 years ago
- 👄🇧🇷 Alinhamento fonético forçado em Português Brasileiro☆13Jul 18, 2025Updated 8 months ago
- Vision Longformer For Object Detection☆34May 17, 2021Updated 4 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year