An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆201Sep 24, 2023Updated 2 years ago
Alternatives and similar repositories for Transformer-CUDA
Users that are interested in Transformer-CUDA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast and low-memory attention layer written in CUDA☆20Jul 14, 2023Updated 2 years ago
- learn TensorRT from scratch🥰☆17Sep 29, 2024Updated last year
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- This repo is my attempt at a rough implementation of nanoGPT trained on a dataset of 30,000 unique Twitter usernames☆23Apr 7, 2024Updated last year
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Sep 6, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Implementation of algorithms for memory optimized deep neural network training☆10Jul 23, 2020Updated 5 years ago
- Solve puzzles. Learn CUDA.☆62Dec 13, 2023Updated 2 years ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- Barebones Rust EVM Implementation☆12Feb 9, 2022Updated 4 years ago
- Swiss tournament manager in solidity☆24Sep 11, 2022Updated 3 years ago
- Simplex Random Feature attention, in PyTorch☆76Oct 10, 2023Updated 2 years ago
- C++ TensorRT Implementation of NanoSAM☆51Dec 28, 2023Updated 2 years ago
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated 2 years ago
- Einsum-like high-level array sharding API for JAX☆34Jul 16, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆17Aug 29, 2022Updated 3 years ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆1,098Dec 30, 2024Updated last year
- GPU programming related news and material links☆2,060Mar 8, 2026Updated 2 weeks ago
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year
- GOO(Gradual Ownership Optimization) issuance implementation using Huff☆34Oct 1, 2022Updated 3 years ago
- Rust Primitives, Learnings, & Frameworks☆17Mar 29, 2022Updated 3 years ago
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- Port of OpenAI's Whisper model in C/C++☆10Jul 12, 2023Updated 2 years ago
- Iterated prisoner's dilemma tournaments implemented with Cairo☆25Jul 10, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- This was a university group project supported by the HSBC Artificial Intelligence team. It involved applying machine learning algorithms …☆14Nov 13, 2023Updated 2 years ago
- YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现☆12Mar 5, 2025Updated last year
- Eth mempool history☆31Sep 2, 2022Updated 3 years ago
- Variable Rate Gradual Dutch Auctions with Martingale Price Correction.☆37Sep 9, 2022Updated 3 years ago
- Imperative deep learning framework with customized GPU and CPU backend☆29Jul 25, 2023Updated 2 years ago
- ☆27Jul 9, 2024Updated last year
- Neural Networks for JAX☆84Sep 24, 2024Updated last year
- Things that make me feel productive☆15Oct 9, 2022Updated 3 years ago
- LLM plugin for models hosted by Anyscale Endpoints☆35Apr 22, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Step-by-step optimization of CUDA SGEMM☆448Mar 30, 2022Updated 3 years ago
- Projects for the ECPiX-5 - a ECP5 FPGA board.☆14Jul 5, 2020Updated 5 years ago
- ffmpeg+cuvid+tensorrt+multicamera☆12Dec 31, 2024Updated last year
- A really tiny autograd engine☆100May 26, 2025Updated 9 months ago
- ☆27Oct 29, 2021Updated 4 years ago
- Inference Llama 2 in C++☆42Apr 29, 2024Updated last year
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated 11 months ago