RobertBiehl / multimodal-instruct
Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.
☆12Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for multimodal-instruct
- Code for "Merging Text Transformers from Different Initializations"☆19Updated 3 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆74Updated last week
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆18Updated last year
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆49Updated 2 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated last month
- ☆38Updated 3 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆110Updated last month
- A huge dataset for Document Visual Question Answering☆14Updated 3 months ago
- ☆22Updated 2 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆19Updated 2 months ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆32Updated 3 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 8 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- Official code and data for NeurIPS 2023 paper "ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial …☆37Updated 11 months ago
- Official code for infimm-hd☆15Updated 2 months ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆34Updated 4 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆175Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆92Updated last month
- LL3M: Large Language and Multi-Modal Model in Jax☆65Updated 7 months ago
- Holistic evaluation of multimodal foundation models☆41Updated 3 months ago
- ☆36Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆67Updated 4 months ago
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆20Updated last month
- ☆84Updated 11 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆17Updated 3 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆44Updated 3 months ago