fangpin / llm-from-scratchView external linksLinks
Build LLM from scratch
☆85Nov 19, 2025Updated 2 months ago
Alternatives and similar repositories for llm-from-scratch
Users that are interested in llm-from-scratch are comparing it to the libraries listed below
Sorting:
- GEMV implementation with CUTLASS☆19Aug 21, 2025Updated 5 months ago
- Efficient GPU communication over multiple NICs.☆22Nov 20, 2025Updated 2 months ago
- ☆32Jul 2, 2025Updated 7 months ago
- A Streaming-Native Serving Engine for TTS/STS Models☆48Updated this week
- ☆88May 31, 2025Updated 8 months ago
- ☆24Jul 7, 2024Updated last year
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆37Jan 15, 2026Updated 3 weeks ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- Nex Venus Communication Library☆72Nov 17, 2025Updated 2 months ago
- ☆54May 5, 2025Updated 9 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Updated this week
- ☆10Jun 28, 2025Updated 7 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- Fine-Grained Knowledge Fusion for Retrieval-Augmented Medical Visual Question☆11Jul 18, 2024Updated last year
- ☆11Jun 11, 2020Updated 5 years ago
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- ☆54Mar 15, 2025Updated 10 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆11Sep 22, 2017Updated 8 years ago
- Open-source audio embedding models, submitted to the HEAR 2021 challenge☆11Updated this week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- Assignment codes for CS736 Algorithms for Medical Image Processing.☆10Aug 10, 2016Updated 9 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Updated this week
- ☆11Apr 3, 2023Updated 2 years ago
- How to plot for papers, slides, demos, etc.☆10Apr 7, 2022Updated 3 years ago
- ☆11Mar 13, 2023Updated 2 years ago
- Chaitin-Briggs register-allocation algorithm (LLVM back-end)☆12Jan 6, 2016Updated 10 years ago
- ☆12May 18, 2024Updated last year
- For our ISSTA'23 paper ACETest: Automated Constraint Extraction for Testing Deep Learning Operators☆13Mar 30, 2024Updated last year
- A Coq framework to support structural design and proof of hardware cache-coherence protocols☆14May 7, 2022Updated 3 years ago
- 🛠Robust SSH: auto-reconnect SSH session that preserves your running shell and command. Intuitive, no server-side setup, aimed at simplic…☆13Nov 14, 2025Updated 3 months ago
- Ring network model test to demonstrate the use of CoreNEURON☆11Aug 19, 2025Updated 5 months ago
- ☆15Jul 18, 2023Updated 2 years ago
- Genome free discovery and classification of miRNAs from small RNA-Seq with random forests☆11Jun 12, 2018Updated 7 years ago
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- ☆10May 16, 2021Updated 4 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- ☆11Sep 16, 2025Updated 4 months ago