microsoft / MoPQLinks
☆13Updated 4 years ago
Alternatives and similar repositories for MoPQ
Users that are interested in MoPQ are comparing it to the libraries listed below
Sorting:
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆52Updated 8 months ago
- Official code for "Binary embedding based retrieval at Tencent"☆44Updated last year
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers☆48Updated 3 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- ☆19Updated last year
- An LLM inference engine, written in C++☆18Updated 6 months ago
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 4 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 5 months ago
- Asynchronous Stochastic Gradient Descent with Delay Compensation☆22Updated 8 years ago
- Linear Attention Sequence Parallelism (LASP)☆88Updated last year
- ☆21Updated 8 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated 2 years ago
- [KDD'22] Learned Token Pruning for Transformers☆102Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- Vocabulary Parallelism☆24Updated 10 months ago
- ☆74Updated 2 years ago
- ☆31Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆30Updated last year
- [SIGMOD2026] Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search with Task-Centric Benchmarks☆18Updated last week
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Updated last year
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆31Updated 3 months ago
- ☆12Updated 2 years ago
- Manages vllm-nccl dependency☆17Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated 2 years ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆98Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆29Updated last year
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆26Updated 2 years ago
- Repository of LV-Eval Benchmark☆73Updated last year