InfiniTensor / TinyInfiniTrainLinks
训练营训练方向项目
☆26Updated last week
Alternatives and similar repositories for TinyInfiniTrain
Users that are interested in TinyInfiniTrain are comparing it to the libraries listed below
Sorting:
- ☆19Updated 8 months ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated last year
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆53Updated 3 weeks ago
- Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing…☆37Updated 3 weeks ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Updated last week
- A Triton-only attention backend for vLLM☆23Updated this week
- ☆25Updated 3 months ago
- ☆14Updated 2 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Updated 11 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25Updated 8 months ago
- My Paper Reading Lists and Notes.☆21Updated 2 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆87Updated 3 weeks ago
- ☆16Updated 9 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56Updated last year
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Updated last year
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆68Updated last year
- ☆37Updated 3 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Updated last year
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Updated 2 years ago
- ☆47Updated 6 months ago
- an implementation of parallel skills like amp, ddp, pp, tp for learning purposes☆14Updated 2 years ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Updated last year
- From Minimal GEMM to Everything☆101Updated last month
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能☆155Updated 5 months ago
- Paper list for accleration of transformers☆13Updated 2 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆52Updated last year
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Updated last year
- A practical way of learning Swizzle☆36Updated last year
- NVIDIA cuTile learn☆154Updated last month
- CUDA SGEMM optimization note☆15Updated 2 years ago