A lightweight deep learning training framework implemented from scratch in C++, featuring a PyTorch-style API.
☆168Apr 4, 2026Updated 2 weeks ago
Alternatives and similar repositories for TinyTorch
Users that are interested in TinyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tiny C++ LLM inference implementation from scratch☆110Apr 4, 2026Updated 2 weeks ago
- Render Spine Animation Using OpenGL/OpenGL ES on Mac/Android/iOS☆15Apr 7, 2022Updated 4 years ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆27Mar 12, 2026Updated last month
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆78Jan 21, 2021Updated 5 years ago
- 对 tensorRT_Pro 开源项目理解☆22Feb 23, 2023Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Handy tools & graphics API abstraction for blazing fast prototyping☆10Jan 17, 2024Updated 2 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆30Apr 9, 2026Updated last week
- Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library☆51Aug 20, 2025Updated 7 months ago
- ☆44Mar 31, 2026Updated 2 weeks ago
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆36Jan 15, 2026Updated 3 months ago
- ☆27May 27, 2024Updated last year
- Large-scale Auto-Distributed Training/Inference Unified Framework | Memory-Compute-Control Decoupled Architecture | Multi-language SDK & …☆55Jan 30, 2026Updated 2 months ago
- An expression template based linear algebra library running completely on the GPU using CUDA☆25Jun 24, 2021Updated 4 years ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Tiny-Megatron, a minimalistic re-implementation of the Megatron library☆25Sep 1, 2025Updated 7 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆502Jan 20, 2026Updated 2 months ago
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- No-dependency OpenGL support library, which abstracts the processes of creating buffers and shaders☆14Apr 28, 2023Updated 2 years ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆59Feb 6, 2026Updated 2 months ago
- An object detection codebase based on MegEngine.☆28Dec 14, 2022Updated 3 years ago
- 晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。☆16Dec 15, 2024Updated last year
- This is our Compiler Design project for 6th semester.☆12May 15, 2022Updated 3 years ago
- Cross-platform GPU LLM inference with WebGPU and wgmath.☆26May 4, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- compiling DSLs to high-level hardware instructions☆23Nov 8, 2022Updated 3 years ago
- ☆19Nov 7, 2024Updated last year
- 并行程序设计导论 源码与课后题答案☆20Jul 9, 2021Updated 4 years ago
- Light Map Baker is a c++ library that bakes lightmaps.☆10Jan 6, 2021Updated 5 years ago
- A homebrew 3D game engine, written in C++. Scalable multithreading, modular, data-driven, Lua scripting, deferred rendering, multithreade…☆13Feb 19, 2025Updated last year
- ☆25May 7, 2021Updated 4 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated 11 months ago
- Android demo for dabnn☆20Oct 18, 2019Updated 6 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆31Dec 21, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆63Dec 5, 2021Updated 4 years ago
- GPU-Accelerated Software Rasterizer☆11Jun 8, 2017Updated 8 years ago
- Voxel Cone Tracing Implementation☆16Nov 18, 2021Updated 4 years ago
- OpenGL C++ 3D game engine☆12Mar 19, 2026Updated last month
- Unity Mirror Transport, using relay server☆10Jun 19, 2020Updated 5 years ago
- [EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization☆23Updated this week
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 10 months ago