working implimention of deepseek MLA
☆44Jan 8, 2025Updated last year
Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-
Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A simple python package for Neural Network based on numpy☆13Sep 6, 2021Updated 4 years ago
- Procedural data generators suite for synthetic pretraining and formal reasoning☆36Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆142May 8, 2025Updated 11 months ago
- ☆19Jul 31, 2024Updated last year
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆29Jul 27, 2025Updated 8 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Some minimal implementation of some Diffusion Models. Try to use as less code and as simple arch as possible☆21Jan 10, 2025Updated last year
- LoRA for convolution layer☆21Mar 9, 2023Updated 3 years ago
- This package introduces a perceptual loss implementation based on the modern ConvNeXt architecture.☆30Nov 14, 2024Updated last year
- A comprehensive codebase for training and finetuning Image <> Latent models.☆50Mar 1, 2025Updated last year
- Understand Human Behavior to Align True Needs☆25Jul 11, 2024Updated last year
- ☆19Aug 19, 2024Updated last year
- An AI-powered git commit message generator written in python.☆22Feb 16, 2023Updated 3 years ago
- browser extension to scroll a page with j and k and a little bit more☆16Apr 6, 2026Updated 2 weeks ago
- ☆46Mar 31, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated last month
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆16Jun 16, 2024Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Apr 22, 2025Updated 11 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- A package for constructing sparse tensors from CSV-like data sources.☆11Dec 24, 2017Updated 8 years ago
- Approximating the joint distribution of language models via MCTS☆22Nov 3, 2024Updated last year
- A demo for the Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models paper (https://arxiv.org/abs/2502.07753)☆41Mar 5, 2025Updated last year
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- ☆35Mar 12, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Hypernetworks for kohya's sd-scripts☆17May 29, 2023Updated 2 years ago
- Image Gaussian Splatting☆25Jul 21, 2025Updated 8 months ago
- Complete Python bindings for the OptiX host API☆59Updated this week
- ☆12Dec 15, 2025Updated 4 months ago
- The official repository of BFSR: "Boosting Flow-based Generative Super-Resolution Models via Learned Prior" [CVPR 2024]☆85Jun 13, 2024Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆234Oct 31, 2024Updated last year
- A graph visualization of attention☆56May 20, 2025Updated 11 months ago
- The Official PyTorch Implementation of "Brain-like Variational Inference" (NeurIPS 2025 Paper)☆71Feb 9, 2026Updated 2 months ago
- Official Implementation of wd1☆28Sep 25, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated 11 months ago
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 9 months ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆10Jan 12, 2021Updated 5 years ago
- Leo optimizer, variation of Muon that runs faster☆59Sep 6, 2025Updated 7 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆92Oct 30, 2024Updated last year
- Corner plots -- now with ads!☆23Mar 31, 2025Updated last year
- An AI accelerator implementation with Xilinx FPGA☆86Jan 29, 2025Updated last year