UbiquitousLearning / PhoneLM
☆56Updated 6 months ago
Alternatives and similar repositories for PhoneLM
Users that are interested in PhoneLM are comparing it to the libraries listed below
Sorting:
- ☆92Updated 7 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆150Updated 2 weeks ago
- FuseAI Project☆86Updated 3 months ago
- Self-host LLMs with LMDeploy and BentoML☆18Updated 2 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- ☆46Updated 9 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆127Updated 3 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago
- ☆37Updated 7 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆164Updated 4 months ago
- Experiments on speculative sampling with Llama models☆126Updated last year
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆84Updated 2 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 3 weeks ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆127Updated 5 months ago
- ☆76Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆158Updated 10 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆97Updated 3 weeks ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 7 months ago
- Official Repository for Task-Circuit Quantization☆20Updated 2 weeks ago
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆52Updated last month
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Scaling Data for SWE-agents☆160Updated this week
- ☆215Updated last week
- Work in progress.☆62Updated last month
- EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).☆62Updated 11 months ago
- ☆78Updated 4 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 11 months ago
- ☆53Updated 11 months ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆169Updated last year