working implimention of deepseek MLA
☆44Jan 8, 2025Updated last year
Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-
Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆31Sep 22, 2025Updated 8 months ago
- Procedural data generators suite for synthetic pretraining and formal reasoning☆40May 24, 2026Updated last week
- This contains Matlab implementation of Johannes Kopf's image processing paper which deals with the adaptive downsampling of images. It gi…☆23Nov 1, 2017Updated 8 years ago
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆29Jul 27, 2025Updated 10 months ago
- Some minimal implementation of some Diffusion Models. Try to use as less code and as simple arch as possible☆21Jan 10, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This package introduces a perceptual loss implementation based on the modern ConvNeXt architecture.☆30Nov 14, 2024Updated last year
- A comprehensive codebase for training and finetuning Image <> Latent models.☆50Mar 1, 2025Updated last year
- Understand Human Behavior to Align True Needs☆25Jul 11, 2024Updated last year
- An AI-powered git commit message generator written in python.☆21Feb 16, 2023Updated 3 years ago
- ☆45Mar 31, 2025Updated last year
- ☆25May 23, 2025Updated last year
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆201Dec 2, 2024Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆74Apr 22, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- Approximating the joint distribution of language models via MCTS☆22Nov 3, 2024Updated last year
- Free rhythm game for electronic drummers☆44May 21, 2026Updated last week
- A demo for the Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models paper (https://arxiv.org/abs/2502.07753)☆41Mar 5, 2025Updated last year
- A simple LLaMA implementation using MLX.☆15Apr 22, 2024Updated 2 years ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- OpenGML - Open Geometric Music Language☆21Jan 28, 2025Updated last year
- Hypernetworks for kohya's sd-scripts☆17May 29, 2023Updated 3 years ago
- A lightweight cluster manager that turns your small fleet of nodes into one powerful computer, using Docker for environment consistency w…☆61May 8, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Search on your images via text or image to image search. Uses OpenAI CLIP embedding and LanceDB☆15Mar 27, 2024Updated 2 years ago
- 📺 A cyberpunk aesthetic theme based on the command line interface of Blade Runner 2049, inspired by the film's dystopian and high-tech v…☆27Jul 12, 2025Updated 10 months ago
- the famous wireless network detector, sniffer, and intrusion detection system☆21Nov 28, 2012Updated 13 years ago
- Serverless LLM Inference: Deploy DeepSeek R1 & LLaMA Models on AWS Lambda with Ultra-Fast Cold Starts☆13Feb 3, 2026Updated 3 months ago
- Complete Python bindings for the OptiX host API☆60May 12, 2026Updated 2 weeks ago
- A graph visualization of attention☆56May 20, 2025Updated last year
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 7 months ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 10 months ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆11Jan 12, 2021Updated 5 years ago
- Instruct-tune LLaMA on consumer hardware☆72Jun 3, 2023Updated 2 years ago
- 🤖 A robotic pick-and-place solution for the Flipkart GRID 5.0 Finals. Features real-time object detection (YOLO), inverse kinematics, an…☆10Jun 23, 2025Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- Corner plots -- now with ads!☆23Mar 31, 2025Updated last year
- ☆45Jun 2, 2023Updated 2 years ago