An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.
☆22Jun 25, 2025Updated 11 months ago
Alternatives and similar repositories for MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention
Users that are interested in MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Use Muon optimizer instead of AdamW.☆49Mar 2, 2026Updated 2 months ago
- Training a BERT model from scratch.☆11Oct 15, 2023Updated 2 years ago
- Long Context Research☆32Jan 26, 2026Updated 4 months ago
- ☆16Jul 7, 2025Updated 10 months ago
- Merge LLM that are split in to parts☆26Mar 18, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Handwritten digit classification web app using Streamlit☆10Jan 15, 2024Updated 2 years ago
- A JavaScript implementation of Richard Dawkin's Biomorph, a simulation that demonstrates the power of natural selection.☆13Dec 11, 2012Updated 13 years ago
- ☆14Aug 31, 2022Updated 3 years ago
- ☆29Oct 21, 2025Updated 7 months ago
- Official implementation of the ECCV2024 paper: Generalizable Facial Expression Recognition☆21Sep 20, 2024Updated last year
- [NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models☆29Feb 19, 2026Updated 3 months ago
- Template repo for Python projects, especially those focusing on machine learning and/or deep learning.☆15Jan 14, 2026Updated 4 months ago
- A Bigram Language Model from scratch with no-smoothing and add-one smoothing. Outputs bigram counts, bigram probabilities and probability…☆15Jan 12, 2021Updated 5 years ago
- Like cookiecutter_pypackage, but for just a module.☆14Jul 27, 2016Updated 9 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python☆18Jan 30, 2023Updated 3 years ago
- Access to Piwik API in Python + django app.☆19Apr 15, 2012Updated 14 years ago
- Javascript wrapper bindings for diamond types☆13Sep 13, 2021Updated 4 years ago
- Modding the LOOΠΔ light stick with a custom PCB/firmware, rechargeable battery and a companion Android app for wireless control.☆13Sep 16, 2022Updated 3 years ago
- Learn how Transformer models are implemented from scratch.☆23Jun 3, 2024Updated last year
- ☆12Jun 7, 2024Updated last year
- ☆17Mar 17, 2021Updated 5 years ago
- ロボットシステム入門 / Let's learn how to create intelligent robot systems with Roomba!☆23Aug 4, 2023Updated 2 years ago
- A Cookie Cutter template for a Pyramid package☆10Jun 2, 2016Updated 9 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Feb 17, 2026Updated 3 months ago
- Mount python — it's fun, not a typo, and next to pointless!☆50Jun 30, 2014Updated 11 years ago
- An AI to compete live in Ghana’s National Science and Maths Quiz competition☆34Jul 23, 2025Updated 10 months ago
- Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client to server for live transcription and optio…☆16Mar 11, 2026Updated 2 months ago
- Model-based time series clustering using variational inference.☆12Oct 28, 2018Updated 7 years ago
- When you really need a Tomek decorator☆13May 4, 2022Updated 4 years ago
- ☆17Oct 8, 2021Updated 4 years ago
- Implementation of BERT-based Language Models☆28Mar 12, 2026Updated 2 months ago
- Graphics engine for games, set on top of bun.js.☆22Apr 2, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Bundle up Python deployment packages for AWS Lambda☆14Apr 20, 2021Updated 5 years ago
- Convenient access to `pynvml` (the library behind `nvidia-smi`)☆23Oct 18, 2024Updated last year
- 6th Position Solution Code for Kaggle - LLM Science Exam Competition☆24Jul 8, 2024Updated last year
- Kylie maps between Model objects and JSON data structures.☆12Dec 26, 2022Updated 3 years ago
- Keyboard Shortcuts for your Django Admin Backend.☆13Sep 14, 2015Updated 10 years ago
- Implements a lightweight workflow for Codex inspired by Recursive Language Models (MIT). Now known as 'recursive-mode'☆57Apr 10, 2026Updated last month
- A curated list of Awesome 3D Vision, including 3D Gaussian Splatting, SLAM, Neural Radiance Fields.☆22Jun 23, 2024Updated last year