flin3500 / Cuda-Google-Colab
The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.
☆21Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Cuda-Google-Colab
- C API for MLX☆79Updated this week
- AMD related optimizations for transformer models☆57Updated 2 weeks ago
- Example ML projects that use the Determined library.☆24Updated 2 months ago
- GGUF parser in Python☆21Updated 3 months ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆48Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆157Updated last year
- ☆82Updated 8 months ago
- minimal C implementation of speculative decoding based on llama2.c☆16Updated 4 months ago
- A really tiny autograd engine☆87Updated 7 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- Course Project for COMP4471 on RWKV☆16Updated 9 months ago
- GPT2 implementation in C++ using Ort☆25Updated 3 years ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆41Updated last month
- Collection of kernels written in Triton language☆68Updated 3 weeks ago
- ☆37Updated 11 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆107Updated last year
- ☆17Updated 3 weeks ago
- Generate python ctypes classes from C headers. Requires LLVM clang☆15Updated 3 months ago
- Python bindings for ggml☆132Updated 2 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- Visualization of cache-optimized matrix multiplication☆53Updated 5 years ago
- Testing LLM reasoning abilities with family relationship quizzes.☆42Updated this week
- Collection of autoregressive model implementation☆67Updated this week
- ☆37Updated 6 months ago
- ring-attention experiments☆97Updated last month
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆51Updated 3 months ago
- Mixed precision training from scratch with Tensors and CUDA☆20Updated 6 months ago
- ☆133Updated 9 months ago
- LLM training in simple, raw C/CUDA☆17Updated 6 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆86Updated 3 months ago