facebookresearch / RLCompOptLinks
Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. (ICML 2023)
☆19Updated last year
Alternatives and similar repositories for RLCompOpt
Users that are interested in RLCompOpt are comparing it to the libraries listed below
Sorting:
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago
- ☆20Updated last month
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆41Updated last month
- ☆71Updated 2 weeks ago
- ☆71Updated 2 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆127Updated this week
- [WIP] Better (FP8) attention for Hopper☆30Updated 3 months ago
- ☆10Updated 3 weeks ago
- ☆49Updated last year
- Quantized Attention on GPU☆44Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated 11 months ago
- ☆49Updated 2 weeks ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated 2 years ago
- ☆105Updated 9 months ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆55Updated 3 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆138Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 8 months ago
- ☆93Updated last week
- extensible collectives library in triton☆87Updated 2 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆88Updated this week
- A block oriented training approach for inference time optimization.☆33Updated 9 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆45Updated 2 weeks ago
- ring-attention experiments☆145Updated 7 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆45Updated 10 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆109Updated 10 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆39Updated last year
- ☆26Updated last year