guolinke / fused_opsLinks
☆10Updated 2 years ago
Alternatives and similar repositories for fused_ops
Users that are interested in fused_ops are comparing it to the libraries listed below
Sorting:
- ☆20Updated 4 years ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆17Updated this week
- Efficient Neural Interaction Functions Search for Collaborative Filtering☆18Updated 5 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 3 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- ☆31Updated last year
- A Toolkit for Training, Tracking, Saving Models and Syncing Results☆61Updated 5 years ago
- lanmt ebm☆12Updated 4 years ago
- ☆13Updated 3 years ago
- Python pdb for multiple processes☆45Updated 2 weeks ago
- triton ver of gqa flash attn, based on the tutorial☆11Updated 10 months ago
- Code for SegTree Transformer (ICLR-RLGM 2019).☆27Updated 5 years ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated last year
- ☆11Updated last year
- ☆22Updated last year
- ☆38Updated last year
- Code for EMNLP 2020 paper CoDIR☆41Updated 2 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Updated 5 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆63Updated 3 years ago
- Differentiable Product Quantization for End-to-End Embedding Compression.☆62Updated 2 years ago
- A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline☆26Updated 4 years ago
- ☆20Updated last month
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆31Updated 2 weeks ago
- Transformers at any scale☆41Updated last year
- Fork of diux-dev/imagenet18☆16Updated 6 years ago
- Using FlexAttention to compute attention with different masking patterns☆43Updated 8 months ago
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆54Updated 2 years ago
- Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …☆29Updated 3 years ago
- ☆10Updated 5 years ago