kyegomez / MMCA
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
β12Updated 8 months ago
Related projects β
Alternatives and complementary repositories for MMCA
- DPO, but faster πβ21Updated 2 weeks ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"β23Updated last week
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)β25Updated 3 months ago
- β13Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"β16Updated this week
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.β33Updated 8 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.β29Updated 4 months ago
- A Data Source for Reasoning Embodied Agentsβ19Updated last year
- Official implementation of ECCV24 paper: POAβ24Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β37Updated 6 months ago
- Official code for the paper "Attention as a Hypernetwork"β23Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"β35Updated 10 months ago
- A repository for research on medium sized language models.β74Updated 5 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inferenceβ¦β18Updated 11 months ago
- β21Updated last week
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"β50Updated last month
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"β14Updated this week
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Updated 7 months ago
- Visual RAG using less than 300 lines of code.β23Updated 8 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β30Updated 9 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrievalβ24Updated last week
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"β21Updated this week
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]β33Updated 2 months ago
- β16Updated 2 months ago
- Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION β TOKENIZER IS KEY TO VISUAL GENERATION"β15Updated last week
- The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"β25Updated this week
- β38Updated last year
- The repository contains code for Adaptive Data Optimizationβ18Updated 3 weeks ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"β36Updated 11 months ago