UbiquitousLearning / PhoneLMLinks
☆57Updated 7 months ago
Alternatives and similar repositories for PhoneLM
Users that are interested in PhoneLM are comparing it to the libraries listed below
Sorting:
- FuseAI Project☆87Updated 5 months ago
- ☆95Updated 8 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆161Updated 3 weeks ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆62Updated 10 months ago
- Verifiers for LLM Reinforcement Learning☆60Updated 2 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆168Updated 5 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆80Updated last month
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆138Updated 3 weeks ago
- KV cache compression for high-throughput LLM inference☆131Updated 4 months ago
- Official Repository for Task-Circuit Quantization☆20Updated 3 weeks ago
- ☆34Updated last month
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- ☆36Updated 2 years ago
- ☆37Updated 8 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆63Updated 9 months ago
- Efficient Agent Training for Computer Use☆106Updated 3 weeks ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆62Updated 10 months ago
- ☆97Updated last month
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 5 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆256Updated last week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆163Updated last year
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆151Updated this week
- llama.cpp tutorial on Android phone☆110Updated last month
- ☆26Updated 4 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- ☆121Updated 10 months ago
- ☆28Updated 4 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆144Updated 9 months ago
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆36Updated 2 months ago