microsoft / MoPQLinks

☆12

Alternatives and similar repositories for MoPQ

Users that are interested in MoPQ are comparing it to the libraries listed below

Sorting:

Harry-Chen / InfMoE
Inference framework for MoE layers based on TensorRT with Python binding
☆41Updated 4 years ago
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
microsoft / AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
☆48Updated 3 years ago
bailuding / rails
Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)
☆50Updated 7 months ago
vllm-project / vllm-nccl
Manages vllm-nccl dependency
☆17Updated last year
microsoft / glinthawk
An LLM inference engine, written in C++
☆17Updated 5 months ago
wangguojim / LargeScale
☆19Updated last year
hpcaitech / CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
☆106Updated 3 years ago
TencentARC / BEBR
Official code for "Binary embedding based retrieval at Tencent"
☆44Updated last year
microsoft / BANG
BANG is a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation. AR and NAR generat…
☆28Updated 3 years ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆101Updated 2 years ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated 4 months ago
deepspeedai / DeepSpeed-Kernels
☆71Updated 8 months ago
0xWJ / code-judge
☆20Updated last month
CoinCheung / gdGPT
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
☆98Updated last year
infinigence / LVEval
Repository of LV-Eval Benchmark
☆71Updated last year
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆70Updated 4 years ago
ProjectD-AI / LLaMA-Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆69Updated 2 years ago
locuslab / scaling_laws_data_filtering
☆65Updated last year
eedalong / Dpex
Distributed DataLoader For Pytorch Based On Ray
☆24Updated 4 years ago
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
microsoft / SpeedyRec
☆24Updated 2 years ago
Azure / MS-AMP-Examples
Examples for MS-AMP package.
☆30Updated 4 months ago
TsinghuaAI / TDS
A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline
☆25Updated 4 years ago
ModelTC / awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Updated last year
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆112Updated 8 months ago
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆198Updated 2 years ago
staoxiao / LibVQ
☆74Updated 2 years ago