microsoft / MoPQLinks
☆12Updated 4 years ago
Alternatives and similar repositories for MoPQ
Users that are interested in MoPQ are comparing it to the libraries listed below
Sorting:
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 4 years ago
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers☆48Updated 3 years ago
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆50Updated 7 months ago
- Manages vllm-nccl dependency☆17Updated last year
- An LLM inference engine, written in C++☆17Updated 5 months ago
- ☆19Updated last year
- A memory efficient DLRM training solution using ColossalAI☆106Updated 3 years ago
- Official code for "Binary embedding based retrieval at Tencent"☆44Updated last year
- BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…☆28Updated 3 years ago
- [KDD'22] Learned Token Pruning for Transformers☆101Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 4 months ago
- ☆71Updated 8 months ago
- ☆20Updated last month
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆98Updated last year
- Repository of LV-Eval Benchmark☆71Updated last year
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆70Updated 4 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Updated 2 years ago
- ☆65Updated last year
- Distributed DataLoader For Pytorch Based On Ray☆24Updated 4 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- ☆24Updated 2 years ago
- Examples for MS-AMP package.☆30Updated 4 months ago
- A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline☆25Updated 4 years ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆112Updated 8 months ago
- ☆122Updated last year
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆198Updated 2 years ago
- ☆74Updated 2 years ago