awsaf49 / flickr-datasetLinks
Download flickr8k, flickr30k image caption datasets
☆39Updated last year
Alternatives and similar repositories for flickr-dataset
Users that are interested in flickr-dataset are comparing it to the libraries listed below
Sorting:
- An open-source implementaion for fine-tuning SmolVLM.☆62Updated 4 months ago
- 1st Place Solution in Google Universal Image Embedding☆67Updated 2 years ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆124Updated last year
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆38Updated last month
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆36Updated 7 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Updated last year
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆23Updated last month
- NoisyNN: Exploring the impact of information entropy change in learning systems☆22Updated last year
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆102Updated last year
- Zero-label image classification via OpenCLIP knowledge distillation☆141Updated 2 years ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Updated last year
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆149Updated 11 months ago
- ☆36Updated 2 years ago
- The most impactful papers related to contrastive pretraining for multimodal models!☆76Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆98Updated last year
- Finetuning CLIP on a small image/text dataset using huggingface libs☆52Updated 3 years ago
- ☆40Updated 2 years ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆30Updated 2 years ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆85Updated 2 years ago
- State-of-the-art data augmentation search algorithms in PyTorch☆47Updated 2 years ago
- Timm model explorer☆42Updated last year
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆24Updated last year
- Official implementation of "Active Image Indexing"☆60Updated 2 years ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆226Updated last year
- Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)☆132Updated 2 weeks ago
- 4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level Recognition workshop at ECCV 2022☆43Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"☆36Updated 4 years ago
- FInetuning CLIP for Few Shot Learning☆46Updated 4 years ago