gregor-ge / Babel-ImageNet
☆20Updated 3 months ago
Related projects: ⓘ
- ☆84Updated 8 months ago
- Index of URLs to pdf files all over the internet and scripts☆20Updated last year
- ☆64Updated 11 months ago
- Holistic evaluation of multimodal foundation models☆36Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- M4 experiment logbook☆56Updated last year
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆77Updated last week
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆92Updated 2 months ago
- ☆19Updated 4 months ago
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆52Updated last year
- LL3M: Large Language and Multi-Modal Model in Jax☆62Updated 4 months ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆51Updated last week
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆33Updated last week
- Matryoshka Multimodal Models☆67Updated 3 weeks ago
- Course materials for 11-767☆13Updated last year
- Big-Interleaved-Dataset☆57Updated last year
- ☆44Updated 3 years ago
- An Image/Text Retrieval Test Collection to Support Multimedia Content Creation☆18Updated 11 months ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆29Updated last month
- Video descriptions of research papers relating to foundation models and scaling☆29Updated last year
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆97Updated 11 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆41Updated last month
- Code for ACL paper "Zero-Shot Text Classification via Self-Supervised Tuning"☆22Updated 11 months ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆52Updated 3 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆91Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- ☆45Updated 7 months ago
- ☆38Updated 5 months ago
- [ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"☆49Updated last year