computerhistory / AlexNet-Source-CodeLinks
This package contains the original 2012 AlexNet code.
☆2,658Updated 3 months ago
Alternatives and similar repositories for AlexNet-Source-Code
Users that are interested in AlexNet-Source-Code are comparing it to the libraries listed below
Sorting:
- Efficient Triton Kernels for LLM Training☆5,275Updated this week
- NanoGPT (124M) in 3 minutes☆2,721Updated last week
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆8,028Updated 3 weeks ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,451Updated 2 months ago
- This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov☆1,806Updated last month
- s1: Simple test-time scaling☆6,468Updated this week
- Democratizing Reinforcement Learning for LLMs☆3,411Updated last month
- Sky-T1: Train your own O1 preview model within $450☆3,286Updated last month
- Muon is Scalable for LLM Training☆1,087Updated 3 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆5,483Updated last week
- nanoGPT style version of Llama 3.1☆1,389Updated 10 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,957Updated 2 weeks ago
- DataComp for Language Models☆1,315Updated 3 months ago
- Simple RL training for reasoning☆3,650Updated 2 months ago
- Nano vLLM☆4,678Updated this week
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.☆2,815Updated 3 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆2,435Updated 2 weeks ago
- The n-gram Language Model☆1,428Updated 10 months ago
- FlashMLA: Efficient MLA decoding kernels☆11,631Updated 2 months ago
- Muon: An optimizer for hidden layers in neural networks☆939Updated last week
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆2,790Updated last year
- ☆575Updated 3 weeks ago
- DeepEP: an efficient expert-parallel communication library☆8,225Updated this week
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,808Updated 2 months ago
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,735Updated last year
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆11,207Updated last month
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,554Updated last month
- Analyze computation-communication overlap in V3/R1.☆1,068Updated 3 months ago
- ☆1,159Updated 2 months ago
- Code for BLT research paper☆1,690Updated last month