gregor-ge / Babel-ImageNet
☆22Updated 8 months ago
Alternatives and similar repositories for Babel-ImageNet:
Users that are interested in Babel-ImageNet are comparing it to the libraries listed below
- ☆89Updated last year
- M4 experiment logbook☆56Updated last year
- Index of URLs to pdf files all over the internet and scripts☆21Updated last year
- A huge dataset for Document Visual Question Answering☆15Updated 6 months ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆62Updated 4 months ago
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆44Updated 7 months ago
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆85Updated last year
- ☆36Updated 8 months ago
- ☆64Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆45Updated last year
- ☆44Updated 3 years ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆155Updated 9 months ago
- Dataset introduced in PlotQA: Reasoning over Scientific Plots☆72Updated last year
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆51Updated last month
- Big-Interleaved-Dataset☆58Updated 2 years ago
- Code for "Open Vocabulary Extreme Classification Using Generative Models"☆24Updated 2 years ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 7 months ago
- Code for ACL paper "Zero-Shot Text Classification via Self-Supervised Tuning"☆27Updated last year
- Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023☆104Updated last year
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆24Updated 2 years ago
- ☆24Updated 5 months ago
- ☆13Updated 2 years ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 4 months ago
- An Image/Text Retrieval Test Collection to Support Multimedia Content Creation☆20Updated last year
- Transformers at any scale☆41Updated last year
- Code for "Merging Text Transformers from Different Initializations"☆19Updated 5 months ago
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆22Updated 4 months ago
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆36Updated last year
- ☆26Updated 10 months ago