An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.
☆21Jun 25, 2025Updated 9 months ago
Alternatives and similar repositories for MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention
Users that are interested in MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Use Muon optimizer instead of AdamW.☆48Mar 2, 2026Updated 3 weeks ago
- Training a BERT model from scratch.☆11Oct 15, 2023Updated 2 years ago
- Long Context Research☆31Jan 26, 2026Updated 2 months ago
- ☆16Jul 7, 2025Updated 8 months ago
- Merge LLM that are split in to parts☆27Mar 18, 2026Updated last week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Handwritten digit classification web app using Streamlit☆10Jan 15, 2024Updated 2 years ago
- A JavaScript implementation of Richard Dawkin's Biomorph, a simulation that demonstrates the power of natural selection.☆13Dec 11, 2012Updated 13 years ago
- ☆14Aug 31, 2022Updated 3 years ago
- ☆26Oct 21, 2025Updated 5 months ago
- [NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models☆29Feb 19, 2026Updated last month
- Official implementation of the ECCV2024 paper: Generalizable Facial Expression Recognition☆20Sep 20, 2024Updated last year
- Template repo for Python projects, especially those focusing on machine learning and/or deep learning.☆15Jan 14, 2026Updated 2 months ago
- A Bigram Language Model from scratch with no-smoothing and add-one smoothing. Outputs bigram counts, bigram probabilities and probability…☆15Jan 12, 2021Updated 5 years ago
- Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python☆18Jan 30, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Like cookiecutter_pypackage, but for just a module.☆14Jul 27, 2016Updated 9 years ago
- Access to Piwik API in Python + django app.☆19Apr 15, 2012Updated 13 years ago
- Javascript wrapper bindings for diamond types☆13Sep 13, 2021Updated 4 years ago
- Learn how Transformer models are implemented from scratch.☆18Jun 3, 2024Updated last year
- Modding the LOOΠΔ light stick with a custom PCB/firmware, rechargeable battery and a companion Android app for wireless control.☆13Sep 16, 2022Updated 3 years ago
- ☆11Jun 7, 2024Updated last year
- ロボットシステム入門 / Let's learn how to create intelligent robot systems with Roomba!☆23Aug 4, 2023Updated 2 years ago
- ☆17Mar 17, 2021Updated 5 years ago
- A Cookie Cutter template for a Pyramid package☆10Jun 2, 2016Updated 9 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆11Feb 17, 2026Updated last month
- Mount python — it's fun, not a typo, and next to pointless!☆50Jun 30, 2014Updated 11 years ago
- Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client to server for live transcription and optio…☆13Mar 11, 2026Updated 2 weeks ago
- An AI to compete live in Ghana’s National Science and Maths Quiz competition☆32Jul 23, 2025Updated 8 months ago
- Model-based time series clustering using variational inference.☆12Oct 28, 2018Updated 7 years ago
- When you really need a Tomek decorator☆12May 4, 2022Updated 3 years ago
- Implementation of BERT-based Language Models☆26Mar 12, 2026Updated 2 weeks ago
- ☆17Oct 8, 2021Updated 4 years ago
- Graphics engine for games, set on top of bun.js.☆20Apr 2, 2025Updated 11 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Bundle up Python deployment packages for AWS Lambda☆14Apr 20, 2021Updated 4 years ago
- Convenient access to `pynvml` (the library behind `nvidia-smi`)☆23Oct 18, 2024Updated last year
- 6th Position Solution Code for Kaggle - LLM Science Exam Competition☆24Jul 8, 2024Updated last year
- Kylie maps between Model objects and JSON data structures.☆12Dec 26, 2022Updated 3 years ago
- Keyboard Shortcuts for your Django Admin Backend.☆13Sep 14, 2015Updated 10 years ago
- SystemVerilog implemention of QEMU PCI edu device☆13May 22, 2023Updated 2 years ago
- A curated list of Awesome 3D Vision, including 3D Gaussian Splatting, SLAM, Neural Radiance Fields.☆22Jun 23, 2024Updated last year