awsaf49 / flickr-datasetLinks
Download flickr8k, flickr30k image caption datasets
☆40Updated 2 years ago
Alternatives and similar repositories for flickr-dataset
Users that are interested in flickr-dataset are comparing it to the libraries listed below
Sorting:
- NoisyNN: Exploring the impact of information entropy change in learning systems☆22Updated last year
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆37Updated 8 months ago
- 1st Place Solution in Google Universal Image Embedding☆67Updated 2 years ago
- Deploy Swin Transformer using TorchServe☆27Updated 4 years ago
- Masked Vision-Language Transformer in Fashion☆38Updated 2 years ago
- An open-source implementaion for fine-tuning SmolVLM.☆62Updated 4 months ago
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆39Updated last month
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Updated 3 years ago
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆24Updated last year
- [ICME 2022] code for the paper, SimVit: Exploring a simple vision transformer with sliding windows.☆68Updated 3 years ago
- Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs…☆25Updated 2 years ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆102Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"☆36Updated 4 years ago
- ☆36Updated 2 years ago
- Code for the Video Similarity Challenge.☆80Updated 2 years ago
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆56Updated 9 months ago
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆58Updated 8 months ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆125Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Updated last year
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13Updated 2 years ago
- State-of-the-art data augmentation search algorithms in PyTorch☆47Updated 2 years ago
- ☆11Updated last year
- DAA: A Delta Age AdaIN operation for age estimation via binary code transformer (CVPR2023)☆37Updated 11 months ago
- Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut, ICML 2022.☆106Updated 2 years ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆86Updated 2 years ago
- ☆45Updated last year
- Test different pooling method used in CNN for Computer Vision Task☆35Updated 5 years ago