haoliuhl / language-quantized-autoencodersLinks
Language Quantized AutoEncoders
β107Updated 2 years ago
Alternatives and similar repositories for language-quantized-autoencoders
Users that are interested in language-quantized-autoencoders are comparing it to the libraries listed below
Sorting:
- Implementation of π» Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchβ88Updated last year
- β59Updated last year
- Matryoshka Multimodal Modelsβ107Updated 4 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"β73Updated 6 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β51Updated last year
- PyTorch implementation of LIMoEβ53Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β78Updated last year
- β118Updated 2 years ago
- Toolkit for Elevater Benchmarkβ72Updated last year
- https://arxiv.org/abs/2209.15162β50Updated 2 years ago
- β97Updated 2 years ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"β42Updated 2 years ago
- Online Adaptation of Language Models with a Memory of Amortized Contexts (NeurIPS 2024)β63Updated 10 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.β33Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ86Updated last year
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learningβ156Updated 2 years ago
- β128Updated 2 years ago
- β51Updated last year
- Compress conventional Vision-Language Pre-training dataβ51Updated last year
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding stratβ¦β78Updated 3 months ago
- Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformersβ92Updated last year
- β48Updated last year
- [NeurIPS 2023] A faithful benchmark for vision-language compositionalityβ79Updated last year
- β54Updated 2 years ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Modelsβ77Updated this week
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Modelsβ44Updated 11 months ago
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuningβ134Updated last year
- code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720β56Updated 11 months ago
- β74Updated 11 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl modelsβ27Updated last year