computerhistory / AlexNet-Source-CodeLinks
This package contains the original 2012 AlexNet code.
☆2,809Updated 9 months ago
Alternatives and similar repositories for AlexNet-Source-Code
Users that are interested in AlexNet-Source-Code are comparing it to the libraries listed below
Sorting:
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,494Updated 2 months ago
- ☆1,228Updated 5 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,900Updated 10 months ago
- Code for BLT research paper☆2,024Updated 2 months ago
- Muon is Scalable for LLM Training☆1,397Updated 5 months ago
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,951Updated 7 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆6,026Updated 2 weeks ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,729Updated 8 months ago
- Minimal reproduction of DeepSeek R1-Zero☆12,571Updated 8 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,030Updated 9 months ago
- Democratizing Reinforcement Learning for LLMs☆4,942Updated last week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,290Updated 6 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,686Updated 4 months ago
- This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov☆2,069Updated 3 weeks ago
- Large Concept Models: Language modeling in a sentence representation space☆2,324Updated 11 months ago
- ☆3,468Updated 10 months ago
- DeepEP: an efficient expert-parallel communication library☆8,862Updated last week
- ☆2,344Updated last month
- Expert Parallelism Load Balancer☆1,329Updated 9 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,539Updated last month
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,896Updated 7 months ago
- s1: Simple test-time scaling☆6,623Updated 6 months ago
- NanoGPT (124M) in 3 minutes☆4,085Updated last week
- Analyze computation-communication overlap in V3/R1.☆1,130Updated 9 months ago
- ☆1,588Updated last year
- Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch☆1,829Updated 3 weeks ago
- Textbook on reinforcement learning from human feedback☆1,382Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,939Updated 4 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,868Updated 6 months ago
- Muon is an optimizer for hidden layers in neural networks☆2,179Updated last month