REAP: Router-weighted Expert Activation Pruning for SMoE compression
☆320Apr 8, 2026Updated this week
Alternatives and similar repositories for reap
Users that are interested in reap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train and run transformers directly on Apple's Neural Engine in Swift☆92Updated this week
- REAP expert pruning for MoE LLMs on Apple Silicon via MLX☆53Mar 16, 2026Updated 3 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆753Updated this week
- Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning☆36Oct 26, 2025Updated 5 months ago
- KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44…☆245Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆21Apr 2, 2025Updated last year
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆21Updated this week
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆30Jun 30, 2025Updated 9 months ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 4 months ago
- A comprehensive and efficient long-context model evaluation framework☆31Feb 25, 2026Updated last month
- Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over…☆248Feb 19, 2026Updated last month
- Model souping for LLMs☆73Nov 18, 2025Updated 4 months ago
- Mini Model Daemon☆12Nov 9, 2024Updated last year
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Jul 31, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A minimal CLI tool for piping anything into an LLM.☆21Jan 1, 2026Updated 3 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆33Nov 4, 2024Updated last year
- LLMProxy is an intelligent large language model backend routing proxy service.☆24Dec 6, 2025Updated 4 months ago
- Use winsqlite3.dll (the SQLite DLL that ships with Windows 10) in PowerShell☆13Jan 12, 2025Updated last year
- A tool for adding function calling to llm api, available as a service by following the link☆22Aug 11, 2025Updated 7 months ago
- Long-term Research Assistants with Self-Scheduling☆53Mar 22, 2026Updated 2 weeks ago
- ☆19Jul 4, 2025Updated 9 months ago
- A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.☆104Jul 9, 2025Updated 9 months ago
- The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and automate reliability☆130Jan 19, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- SOTA rounding-based quantization for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype supp…☆957Updated this week
- NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits (ICML'25)☆43Jul 9, 2025Updated 9 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 2 months ago
- Your AI Soul Companion. Self-hosted AI agent across 30+ messaging channels It can not only serve as an emotional companion in daily life …☆42Mar 27, 2026Updated 2 weeks ago
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated 2 years ago
- An fully autonomous agent that accesses the browser and performs tasks.☆18Apr 25, 2025Updated 11 months ago
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆55Mar 14, 2025Updated last year
- ☆13Dec 21, 2024Updated last year
- [ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling☆37Feb 25, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A Knowledge-grounded framework for Autonomous ML/AI Program Synthesis and Optimization☆89Feb 20, 2026Updated last month
- Memory Agent monorepo☆85Oct 9, 2025Updated 6 months ago
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- Official Chinese documentation for RWKV | RWKV官方中文文档☆15Mar 27, 2026Updated last week
- ☆12Aug 18, 2021Updated 4 years ago
- synthetic dataset generation workflow using local file resources for finetuning llms.☆82Oct 9, 2025Updated 6 months ago
- A simple frontend page to interact with an OpenAI like API☆16Jan 31, 2025Updated last year