haoliuhl / language-quantized-autoencoders
Language Quantized AutoEncoders
β95Updated last year
Alternatives and similar repositories for language-quantized-autoencoders:
Users that are interested in language-quantized-autoencoders are comparing it to the libraries listed below
- β117Updated last year
- https://arxiv.org/abs/2209.15162β48Updated last year
- Implementation of π» Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorchβ88Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β65Updated 11 months ago
- Patching open-vocabulary models by interpolating weightsβ91Updated last year
- Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023β146Updated last year
- β67Updated 6 months ago
- β78Updated last year
- β47Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β112Updated 6 months ago
- β51Updated last year
- Implementation of Bitune: Bidirectional Instruction-Tuningβ16Updated 7 months ago
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learningβ143Updated 2 years ago
- β127Updated 2 years ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ78Updated 8 months ago
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Modelsβ71Updated last month
- Matryoshka Multimodal Modelsβ90Updated last month
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β38Updated 10 months ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorchβ87Updated last year
- Holistic evaluation of multimodal foundation modelsβ42Updated 5 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relyβ¦β49Updated last year
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".β78Updated 2 years ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β54Updated last year
- Release of ImageNet-Captionsβ45Updated last year
- code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720β55Updated 7 months ago
- Command-line tool for downloading and extending the RedCaps dataset.β46Updated last year
- M4 experiment logbookβ56Updated last year
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlightβ37Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"β73Updated last month
- Official repo for StableLLAVAβ94Updated last year