microsoft / GRIN-MoE
☆130Updated this week
Related projects: ⓘ
- ☆40Updated 2 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆62Updated last year
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated 8 months ago
- ☆29Updated 2 weeks ago
- ☆45Updated 7 months ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- ☆26Updated this week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆68Updated 2 months ago
- ☆37Updated 5 months ago
- ☆40Updated 4 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆30Updated last month
- 📜 [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswa…☆36Updated 10 months ago
- ☆50Updated last month
- ☆61Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆57Updated 5 months ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆24Updated 2 months ago
- ☆117Updated 7 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- ☆22Updated 3 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆34Updated 10 months ago
- QuIP quantization☆41Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soon☆18Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆28Updated 4 months ago
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆57Updated 11 months ago
- ☆30Updated 4 months ago
- A repository for research on medium sized language models.☆71Updated 3 months ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆26Updated last week
- Latent Large Language Models☆16Updated 3 weeks ago