facebookresearch / RLCompOptLinks

Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. (ICML 2023)

☆19

Alternatives and similar repositories for RLCompOpt

Users that are interested in RLCompOpt are comparing it to the libraries listed below

Sorting:

facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆24Updated 3 weeks ago
tridao / flash-attention-wheels
☆51Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
gpu-mode / triton-tutorials
☆13Updated 2 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆64Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
Dao-AILab / gemm-cublas
☆21Updated 2 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
facebookresearch / macta
MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection
☆46Updated 2 years ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 3 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆70Updated last year
megvii-research / IntLLaMA
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Updated last year
thuml / learn_torch.compile
torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile
☆17Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated 2 years ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆70Updated last year
gregorbachmann / scaling_mlps
☆51Updated last year
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated 2 weeks ago
feifeibear / ChituAttention
Quantized Attention on GPU
☆44Updated 7 months ago
NVlabs / EfficientDL
☆33Updated last month
Deep-Learning-Profiling-Tools / triton-samples
☆13Updated 4 months ago
Dao-AILab / grouped-latent-attention
☆119Updated last month
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆34Updated this week
facebookresearch / NasRec
NASRec Weight Sharing Neural Architecture Search for Recommender Systems
☆30Updated last year
FrancescoSaverioZuppichini / pytorch-2.0-benchmark
Benchmarking PyTorch 2.0 different models
☆21Updated 2 years ago
ai-compiler-study / triton-kernels
Triton kernels for Flux
☆20Updated last week