Implementation of π» Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
β92Dec 22, 2023Updated 2 years ago
Alternatives and similar repositories for mirasol-pytorch
Users that are interested in mirasol-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of GateLoop Transformer in Pytorch and Jaxβ92Jun 18, 2024Updated last year
- Implementation of the Llama architecture with RLHF + Q-learningβ170Feb 1, 2025Updated last year
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topkβ47Jul 16, 2023Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findingsβ46May 23, 2023Updated 2 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"β59Oct 22, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-expertsβ122Oct 17, 2024Updated last year
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge fasterβ71May 18, 2025Updated 11 months ago
- Implementation of Agent Attention in Pytorchβ93Jul 10, 2024Updated last year
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of newβ¦β126Jul 26, 2024Updated last year
- Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmindβ22Jun 9, 2024Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorchβ127Aug 25, 2025Updated 7 months ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPTβ227Mar 25, 2026Updated 3 weeks ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmindβ179Sep 12, 2024Updated last year
- A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answeringβ43Nov 8, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} Γ {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}β14Jun 18, 2023Updated 2 years ago
- An optimized pipeline for working with Whole Slide Image (WSI) data in Tensorflowβ14Apr 30, 2021Updated 4 years ago
- Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"β88Oct 13, 2023Updated 2 years ago
- A vast array of Multi-Modal Embodied Robotic Foundation Models!β28Mar 18, 2024Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β53Oct 22, 2023Updated 2 years ago
- Implementation of the algorithm detailed in paper "Evolutionary design of molecules based on deep learning and a genetic algorithm"β24Dec 15, 2023Updated 2 years ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zetaβ13Nov 11, 2024Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorchβ136Oct 15, 2025Updated 6 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)β55Mar 25, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation of a holodeck, written in Pytorchβ19Nov 1, 2023Updated 2 years ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"β182Jun 20, 2024Updated last year
- Implementation of Uformer, Attention-based Unet, in Pytorchβ96Oct 26, 2021Updated 4 years ago
- Implementation of Nvidia's NeuralPlexer, for end-to-end differentiable design of functional small-molecules and ligand-binding proteins, β¦β52Nov 20, 2023Updated 2 years ago
- Pytorch reimplementation of Molecule Attention Transformer, which uses a transformer to tackle the graph-like structure of moleculesβ58Dec 2, 2020Updated 5 years ago
- Fine-tune copilot based on your codebaseβ12Mar 26, 2024Updated 2 years ago
- Implementation of a U-net complete with efficient attention as well as the latest research findingsβ291May 3, 2024Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixingβ49Jan 27, 2022Updated 4 years ago
- A repository to house some personal attempts to beat some state-of-the-art for medical datasetsβ101Nov 20, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β104Dec 22, 2024Updated last year
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Tiβ¦β11Nov 28, 2023Updated 2 years ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"β70Apr 10, 2023Updated 3 years ago
- Implementation of Denoising Diffusion for protein design, but using the new Equiformer (successor to SE3 Transformers) with some additionβ¦β57Dec 27, 2022Updated 3 years ago
- Implementation of MetNet-3, SOTA neural weather model out of Google Deepmind, in Pytorchβ238Nov 16, 2023Updated 2 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusionβ56Jul 1, 2025Updated 9 months ago
- My explorations into editing the knowledge and memories of an attention networkβ35Dec 8, 2022Updated 3 years ago