kyegomez / qformerLinks
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
☆39Updated 6 months ago
Alternatives and similar repositories for qformer
Users that are interested in qformer are comparing it to the libraries listed below
Sorting:
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated 10 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- ☆31Updated last month
- Keras implement of Finite Scalar Quantization☆73Updated last year
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 5 months ago
- ☆43Updated 2 weeks ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 4 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆102Updated 9 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆45Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 10 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆99Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆25Updated 5 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 10 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆51Updated last week
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆51Updated last year
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆65Updated last week
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆79Updated last year
- The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)☆26Updated last year
- Data-Efficient Multimodal Fusion on a Single GPU☆61Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆74Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated last month
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆163Updated last week
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆70Updated this week
- ☆28Updated 3 weeks ago
- PyTorch implementation of StableMask (ICML'24)☆13Updated 11 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆34Updated 11 months ago
- ☆18Updated last year