computerhistory / AlexNet-Source-CodeLinks
This package contains the original 2012 AlexNet code.
☆2,766Updated 8 months ago
Alternatives and similar repositories for AlexNet-Source-Code
Users that are interested in AlexNet-Source-Code are comparing it to the libraries listed below
Sorting:
- ☆1,199Updated 3 months ago
- Muon is Scalable for LLM Training☆1,354Updated 3 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆5,874Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,229Updated 2 weeks ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,662Updated 6 months ago
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,606Updated 5 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,770Updated 5 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,877Updated 8 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,397Updated last month
- ☆2,053Updated 2 weeks ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,858Updated last month
- A Datacenter Scale Distributed Inference Serving Framework☆5,450Updated this week
- Text-audio foundation model from Boson AI☆7,620Updated last month
- Nano vLLM☆8,748Updated last week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,931Updated 5 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,399Updated 2 months ago
- DeepEP: an efficient expert-parallel communication library☆8,712Updated last week
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,981Updated 4 months ago
- FlashMLA: Efficient Multi-head Latent Attention Kernels☆11,857Updated last month
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆3,891Updated this week
- NanoGPT (124M) in 3 minutes☆3,785Updated last week
- Code for BLT research paper☆2,006Updated last week
- DataComp for Language Models☆1,385Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,890Updated 2 months ago
- GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆1,736Updated 2 weeks ago
- Democratizing Reinforcement Learning for LLMs☆4,699Updated this week
- Video+code lecture on building nanoGPT from scratch☆4,524Updated last year
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆16,124Updated 2 weeks ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,236Updated 4 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,969Updated 7 months ago