awsaf49 / flickr-datasetLinks
Download flickr8k, flickr30k image caption datasets
☆25Updated last year
Alternatives and similar repositories for flickr-dataset
Users that are interested in flickr-dataset are comparing it to the libraries listed below
Sorting:
- TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"☆35Updated 3 years ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆26Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆34Updated last year
- Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆11Updated last month
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 7 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆24Updated this week
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆14Updated last month
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆32Updated last month
- ☆34Updated last year
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆22Updated 11 months ago
- ☆16Updated 2 years ago
- Task Agnostic Unsupervised Learning☆15Updated 3 years ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆101Updated 10 months ago
- ViT trained on COYO-Labeled-300M dataset☆32Updated 2 years ago
- 🔥MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer [Official, ICLR 2023]☆21Updated last year
- 4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level Recognition workshop at ECCV 2022☆42Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago
- An open-source implementaion for fine-tuning SmolVLM.☆41Updated 2 months ago
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆30Updated 9 months ago
- [ICME 2022] code for the paper, SimVit: Exploring a simple vision transformer with sliding windows.☆68Updated 2 years ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆85Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆56Updated 8 months ago
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆53Updated 2 months ago
- State-of-the-art data augmentation search algorithms in PyTorch☆47Updated last year
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆12Updated 2 years ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆32Updated 9 months ago
- ☆39Updated 11 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆81Updated 2 weeks ago
- Official PyTorch implementation of ResFormer: Scaling ViTs with Multi-Resolution Training, CVPR2023☆29Updated 2 years ago
- Masked Vision-Language Transformer in Fashion☆34Updated last year