computerhistory / AlexNet-Source-CodeLinks
This package contains the original 2012 AlexNet code.
☆2,823Updated 10 months ago
Alternatives and similar repositories for AlexNet-Source-Code
Users that are interested in AlexNet-Source-Code are comparing it to the libraries listed below
Sorting:
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,759Updated 9 months ago
- Muon is Scalable for LLM Training☆1,421Updated 6 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,625Updated 3 months ago
- NanoGPT (124M) in 2 minutes☆4,515Updated last week
- This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov☆2,087Updated last month
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,328Updated 7 months ago
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆3,545Updated 3 weeks ago
- Democratizing Reinforcement Learning for LLMs☆5,060Updated this week
- Sky-T1: Train your own O1 preview model within $450☆3,370Updated 6 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,823Updated 3 months ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,914Updated 7 months ago
- Simple RL training for reasoning☆3,829Updated last month
- Witness the aha moment of VLM with less than $3.☆4,027Updated 8 months ago
- ☆1,233Updated 6 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,038Updated 10 months ago
- Muon is an optimizer for hidden layers in neural networks☆2,267Updated 2 weeks ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,903Updated 5 months ago
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,924Updated 8 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆3,538Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,058Updated 5 months ago
- ☆3,466Updated 11 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,863Updated this week
- ☆1,543Updated 2 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,916Updated 3 weeks ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,406Updated 9 months ago
- Textbook on reinforcement learning from human feedback☆1,478Updated last week
- Minimal reproduction of DeepSeek R1-Zero☆12,646Updated 9 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,366Updated 3 weeks ago
- DataComp for Language Models☆1,413Updated 4 months ago
- s1: Simple test-time scaling☆6,635Updated 7 months ago