kyegomez / MMCALinks
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
☆12Updated last year
Alternatives and similar repositories for MMCA
Users that are interested in MMCA are comparing it to the libraries listed below
Sorting:
- Transformers + Mambas + LSTMS All in One Model☆9Updated 3 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 2 months ago
- ☆14Updated 2 years ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated 3 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated 3 weeks ago
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- research work on multimodal cognitive ai☆63Updated last week
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 3 months ago
- ☆13Updated 9 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆34Updated 11 months ago
- Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"☆16Updated 7 months ago
- Autoregressive Image Generation☆32Updated last week
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- Implementation of Metaformer, but in an autoregressive manner☆25Updated 3 years ago
- Tiktok is an advanced multimedia recommender system that fuses the generative modality-aware collaborative self-augmentation and contrast…☆13Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆15Updated 7 months ago
- ☆32Updated 5 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated 7 months ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Updated 7 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆24Updated 3 weeks ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆25Updated this week
- ☆8Updated last year
- Resa: Transparent Reasoning Models via SAEs☆30Updated last week
- Official implementation of the paper The Hidden Language of Diffusion Models☆73Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 2 weeks ago