SwayamInSync / pytorch-cpp-cuda-starterLinks
Setting up Vscode to work with Pytorch in C/C++ with CUDA support
☆25Updated 6 months ago
Alternatives and similar repositories for pytorch-cpp-cuda-starter
Users that are interested in pytorch-cpp-cuda-starter are comparing it to the libraries listed below
Sorting:
- ☆64Updated this week
- ☆46Updated 4 months ago
- Learnings and programs related to CUDA☆415Updated last month
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 5 months ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 10 months ago
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆227Updated 7 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆66Updated 3 months ago
- ☆238Updated last week
- My submission for the GPUMODE/AMD fp8 mm challenge☆27Updated 2 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆178Updated 2 weeks ago
- Learning about CUDA by writing PTX code.☆134Updated last year
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆471Updated 2 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆190Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆97Updated 3 weeks ago
- coding CUDA everyday!☆56Updated 4 months ago
- learning & making kernels in cuda / triton☆21Updated 2 months ago
- Low memory full parameter finetuning of LLMs☆52Updated last month
- ☆44Updated 3 months ago
- Andrej Kapathy's micrograd implemented in c☆29Updated last year
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆66Updated 4 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆53Updated last year
- Canny edge detector implemented in CUDA C/C++☆27Updated 6 months ago
- GPU Kernels☆193Updated 4 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆69Updated 4 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆83Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆48Updated 5 months ago
- Block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge. Additionally, this repo includes codes for …☆15Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆391Updated 5 months ago
- A curated list of awesome mobile machine learning resources.☆141Updated 6 years ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆278Updated 9 months ago