microsoft / MoPQ
☆12Updated 3 years ago
Alternatives and similar repositories for MoPQ:
Users that are interested in MoPQ are comparing it to the libraries listed below
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆41Updated last month
- ☆74Updated 2 years ago
- Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval☆15Updated 3 years ago
- ☆24Updated last year
- Official code for "Binary embedding based retrieval at Tencent"☆42Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- ☆18Updated last year
- Official PyTorch implementation of IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact☆43Updated 10 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆90Updated last week
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 9 months ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆39Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆44Updated 5 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆62Updated 3 years ago
- The official repository for "Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation", Shen…☆119Updated last year
- Repository of LV-Eval Benchmark☆61Updated 7 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆68Updated 9 months ago
- Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation…☆32Updated last year
- CIKM'21: JPQ substantially improves the efficiency of Dense Retrieval with 30x compression ratio, 10x CPU speedup and 2x GPU speedup.☆52Updated 3 years ago
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆51Updated 9 months ago
- ☆104Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- ☆124Updated 8 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 2 months ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆15Updated 5 months ago
- ☆19Updated 10 months ago
- ☆98Updated 6 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆158Updated 9 months ago
- ☆66Updated 2 years ago