neulab / Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
☆88Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Pangea
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆53Updated 4 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆83Updated last month
- This is the official repository for Inheritune.☆105Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆168Updated 3 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆47Updated last month
- ☆55Updated 3 months ago
- ☆58Updated 4 months ago
- ☆57Updated last month
- From scratch implementation of a vision language model in pure PyTorch☆160Updated 6 months ago
- ☆44Updated last month
- Expert Specialized Fine-Tuning☆143Updated last month
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆57Updated 5 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆137Updated 5 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated last week
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆126Updated this week
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆178Updated last month
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- Quick exploration into fine tuning florence 2☆267Updated last month
- ☆35Updated last year
- ☆86Updated 10 months ago
- Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm…☆119Updated this week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆168Updated last week
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆58Updated 2 weeks ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆77Updated 5 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆60Updated 3 months ago
- a curated list of the role of small models in the LLM era☆76Updated last month
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆151Updated 7 months ago
- ☆181Updated last week