determined-ai / determined-examples
Example ML projects that use the Determined library.
☆24Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for determined-examples
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆61Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- ☆47Updated 2 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆56Updated last month
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- LLM KV cache compression made easy☆64Updated last week
- ☆55Updated 5 months ago
- Example of applying CUDA graphs to LLaMA-v2☆10Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- ☆24Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆19Updated this week
- ☆45Updated 2 weeks ago
- extensible collectives library in triton☆72Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆57Updated 5 months ago
- Personal solutions to the Triton Puzzles☆16Updated 4 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- A block oriented training approach for inference time optimization.☆30Updated 3 months ago
- Make triton easier☆41Updated 5 months ago
- ☆88Updated 2 months ago
- ring-attention experiments☆97Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated 3 weeks ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- A minimal implementation of vllm.☆30Updated 3 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- ☆74Updated 11 months ago
- Code for Palu: Compressing KV-Cache with Low-Rank Projection☆57Updated this week
- ☆12Updated last month