NickL77/BaldEagle

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NickL77/BaldEagle)

NickL77 / BaldEagle

3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding

☆85

Alternatives and similar repositories for BaldEagle

Users that are interested in BaldEagle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HArmonizedSS / HASS
View on GitHub
Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
☆56Mar 14, 2025Updated last year
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,481Feb 20, 2026Updated 5 months ago
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,027Updated this week
sgl-project / sgl-flash-attn
View on GitHub
Fast and memory-efficient exact attention
☆22Jun 26, 2026Updated last month
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆402Apr 22, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
code-forge-temple / scribe-pal
View on GitHub
ScribePal is an Open Source intelligent browser extension that leverages AI to empower your web experience by providing contextual insigh…
☆22Apr 6, 2026Updated 3 months ago
AniZpZ / AutoSmoothQuant
View on GitHub
An easy-to-use package for implementing SmoothQuant for LLMs
☆111Apr 7, 2025Updated last year
AaronFeng753 / Qwen3-Gemini2.5
View on GitHub
Make Qwen3 Think like Gemini 2.5 Pro | Open webui function
☆25May 10, 2025Updated last year
axolotl-ai-cloud / grpo_code
View on GitHub
A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.
☆41Apr 4, 2025Updated last year
haiduo / Jakiro
View on GitHub
This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Mai…
☆37Oct 5, 2025Updated 9 months ago
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
FarFetchd / sleepyllama
View on GitHub
an auto-sleeping and -waking framework around llama.cpp
☆13Feb 8, 2025Updated last year
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆170Dec 23, 2025Updated 7 months ago
XMUDeepLIT / DAMAML
View on GitHub
Code for "Domain Adaptive Meta-learning for Dialogue State Tracking"(TASLP2021)
☆10Sep 14, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
angrysky56 / llada_gui
View on GitHub
GUI for LLaDA Diffusion LLM with Quantization for low end GPU and CPU options.
☆25Mar 7, 2025Updated last year
tengxiaoliu / RLET
View on GitHub
[EMNLP 2022] RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees
☆11Jul 15, 2023Updated 3 years ago
catid / cuda_float_compress
View on GitHub
Python package for compressing floating-point PyTorch tensors
☆13Jul 22, 2024Updated 2 years ago
foundation-model-stack / fms-fsdp
View on GitHub
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆288Nov 24, 2025Updated 8 months ago
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆267Apr 23, 2024Updated 2 years ago
hasaranga / NativeChat
View on GitHub
win32 native frontend for llama-cli
☆14Nov 2, 2024Updated last year
XMUDeepLIT / QGC
View on GitHub
Code for "Retaining Key Information under High Compression Rates: Query-Guided Compressor for LLMs" (ACL 2024)
☆19Jun 12, 2024Updated 2 years ago
premAI-io / benchmarks
View on GitHub
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆142Jul 25, 2024Updated 2 years ago
UNITES-Lab / MoE-Quantization
View on GitHub
Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"
☆31Jun 30, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
BY571 / DistRL-LLM
View on GitHub
Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization
☆22Mar 12, 2025Updated last year
natnew / awesome-ai-scientists
View on GitHub
A curated collection of resources for building “AI Scientist” systems: AI that assists scientific discovery through literature intelligen…
☆14Updated this week
tolitius / towel
View on GitHub
"a towel is about the most massively useful thing an interstellar AI hitchhiker can have"
☆48Oct 9, 2024Updated last year
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,283Jun 27, 2026Updated last month
charmandercha / ArchiDoc
View on GitHub
☆16Dec 16, 2024Updated last year
rabbidave / ZeroDay.Tools
View on GitHub
Gen AI Hardening x Attack Suite
☆17Jul 9, 2025Updated last year
ljcleo / agent_sense
View on GitHub
Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
☆13Jan 4, 2025Updated last year
fourMs / MGT-matlab
View on GitHub
Musical Gestures Toolbox for Matlab
☆10Dec 21, 2020Updated 5 years ago
j4ys0n / llm-proxy
View on GitHub
TLS & API keys for your LLM APIs
☆20Dec 17, 2025Updated 7 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
OpenPPL / ppl.llm.serving
View on GitHub
☆128Dec 24, 2024Updated last year
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
XMUDeepLIT / CG-ASED
View on GitHub
Code for "An AST Structure Enhanced Decoder for Code Generation"
☆15Oct 14, 2021Updated 4 years ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,062Updated this week
pany8125 / ShareGPTQAExtractor-mnbvc
View on GitHub
MNBVC项目-ShareGPT语料清洗
☆16Oct 4, 2023Updated 2 years ago
NVIDIA / Model-Optimizer
View on GitHub
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative…
☆3,341Updated this week
THUDM / paper-source-trace
View on GitHub
☆19Sep 29, 2024Updated last year