aryagxr / cudaLinks
coding CUDA everyday!
☆31Updated last month
Alternatives and similar repositories for cuda
Users that are interested in cuda are comparing it to the libraries listed below
Sorting:
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆184Updated last week
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆58Updated last week
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆218Updated 5 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆67Updated 2 months ago
- ☆46Updated 2 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆357Updated 2 months ago
- "LLM from Zero to Hero: An End-to-End Large Language Model Journey from Data to Application!"☆29Updated last month
- ☆188Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- An extension of the nanoGPT repository for training small MOE models.☆147Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- A really tiny autograd engine☆94Updated last week
- ☆35Updated last week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆100Updated 2 months ago
- ☆157Updated last year
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆268Updated 6 months ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- ☆126Updated 2 months ago
- GPU Kernels☆178Updated last month
- Question paper of courses taught at IISC as part of MTech AI curriculum☆65Updated 6 months ago
- working implimention of deepseek MLA☆41Updated 4 months ago
- rl from zero pretrain, can it be done? we'll see.☆24Updated this week
- ☆39Updated last month
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆76Updated last month
- Compiling useful links, papers, benchmarks, ideas, etc.☆46Updated 2 months ago
- Learnings and programs related to CUDA☆402Updated 3 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 6 months ago
- A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch☆196Updated last month
- ☆328Updated last month