computerhistory / AlexNet-Source-CodeLinks
This package contains the original 2012 AlexNet code.
☆2,782Updated 8 months ago
Alternatives and similar repositories for AlexNet-Source-Code
Users that are interested in AlexNet-Source-Code are comparing it to the libraries listed below
Sorting:
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆5,921Updated last week
- ☆1,223Updated 4 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,351Updated last month
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,884Updated 8 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,682Updated 7 months ago
- FlashMLA: Efficient Multi-head Latent Attention Kernels☆11,886Updated 2 months ago
- DeepEP: an efficient expert-parallel communication library☆8,771Updated 2 weeks ago
- Expert Parallelism Load Balancer☆1,315Updated 8 months ago
- NanoGPT (124M) in 3 minutes☆3,922Updated this week
- Code release for DynamicTanh (DyT)☆1,026Updated 8 months ago
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆3,466Updated last month
- Nano vLLM☆9,459Updated last month
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,018Updated last month
- Sky-T1: Train your own O1 preview model within $450☆3,358Updated 4 months ago
- ☆1,586Updated last year
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,937Updated 6 months ago
- ☆1,416Updated last week
- 🚀 Efficient implementations of state-of-the-art linear attention models☆3,996Updated this week
- Muon is Scalable for LLM Training☆1,372Updated 4 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,248Updated 5 months ago
- Analyze computation-communication overlap in V3/R1.☆1,124Updated 8 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,911Updated 3 months ago
- Code for BLT research paper☆2,013Updated last month
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,007Updated 8 months ago
- Muon is an optimizer for hidden layers in neural networks☆2,075Updated 2 weeks ago
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆5,143Updated 9 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,054Updated last week
- ☆5,566Updated 10 months ago
- ☆2,193Updated this week
- Democratizing Reinforcement Learning for LLMs☆4,792Updated last week