CerebrasResearch / reapView external linksLinks
REAP: Router-weighted Expert Activation Pruning for SMoE compression
☆242Dec 9, 2025Updated 2 months ago
Alternatives and similar repositories for reap
Users that are interested in reap are comparing it to the libraries listed below
Sorting:
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆16Updated this week
- A Prompt Enhancer for flux.1 in ComfyUI☆12Jan 11, 2026Updated last month
- Get aid from local LLMs right in your PowerShell☆15May 2, 2025Updated 9 months ago
- ☆18Dec 9, 2025Updated 2 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 3 weeks ago
- Official and Third Party Plugins & Themes for DankMaterialShell☆34Updated this week
- LLMProxy is an intelligent large language model backend routing proxy service.☆22Dec 6, 2025Updated 2 months ago
- 🔍📃 LLM-powered PDF Table Extractor☆19Jun 26, 2025Updated 7 months ago
- Simple node proxy for llama-server that enables MCP use☆17May 10, 2025Updated 9 months ago
- Extension for AUTOMATIC1111/stable-diffusion-webui for pasting images from clipboard in any WebUI form.☆16Nov 22, 2023Updated 2 years ago
- Offline LLM chatbot with personalized memory — works on CPU with multi-session memory support.☆22Jan 10, 2026Updated last month
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆633Updated this week
- Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning☆35Oct 26, 2025Updated 3 months ago
- (ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning☆37Nov 17, 2025Updated 2 months ago
- An fully autonomous agent that accesses the browser and performs tasks.☆17Apr 25, 2025Updated 9 months ago
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated 11 months ago
- Memory Agent monorepo☆81Oct 9, 2025Updated 4 months ago
- A tool for adding function calling to llm api, available as a service by following the link☆22Aug 11, 2025Updated 6 months ago
- CompChomper is a framework for measuring how LLMs perform at code completion.☆19Apr 29, 2025Updated 9 months ago
- A backup of SmokelessRuntimeEFIPatcher☆27Jun 19, 2024Updated last year
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30May 18, 2025Updated 8 months ago
- A utility for generating conversational podcasts with AI text-to-speech, inspired by Google's NotebookLM.☆20Sep 16, 2024Updated last year
- Implementation of layer diffuse inference using refiners☆25Apr 25, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedri…☆37Feb 6, 2026Updated last week
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated last month
- A TypeScript example showcasing the integration of Ollama with the Model Context Protocol (MCP) servers. This project provides an interac…☆27Aug 21, 2025Updated 5 months ago
- A pipeline parallel training script for LLMs.☆166Apr 30, 2025Updated 9 months ago
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆845Feb 6, 2026Updated last week
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆29Jun 30, 2025Updated 7 months ago
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 2 months ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- ☆23Apr 22, 2021Updated 4 years ago
- (ICLR 2026) Unveiling Super Experts in Mixture-of-Experts Large Language Models☆36Sep 25, 2025Updated 4 months ago
- This repo provides a simple Gradio UI to run Qwen2 VL 72B AWQ in venv and have both image and video inferencing work.☆33Oct 3, 2024Updated last year
- ☆52Oct 10, 2025Updated 4 months ago
- ☆112Jun 19, 2025Updated 7 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,129Updated this week
- Genertaes control vectors for use with llama.cpp in GGUF format.☆38Mar 19, 2025Updated 10 months ago