[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆52Jun 17, 2025Updated 11 months ago
Alternatives and similar repositories for DeFT
Users that are interested in DeFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆30Mar 24, 2025Updated last year
- ☆19Aug 10, 2024Updated last year
- [NeurIPS 2024] Activating Self-Attention for Multi-Scene Absolute Pose Regression☆14Feb 24, 2025Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated last year
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆37Aug 7, 2025Updated 10 months ago
- ☆13Sep 2, 2023Updated 2 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- MMoE: Multimodal Mixture-of-Experts (EMNLP 2024)☆16Nov 14, 2024Updated last year
- Efficient GPU communication over multiple NICs.☆29Nov 20, 2025Updated 6 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆138Nov 10, 2025Updated 7 months ago
- ☆248Nov 19, 2025Updated 6 months ago
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆36Updated this week
- ☆13Mar 18, 2022Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A case for representing data collections and objects in the LLVM IR☆25Jan 29, 2026Updated 4 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation☆17Apr 3, 2024Updated 2 years ago
- ☆15May 4, 2025Updated last year
- 这是一个大学四年的cs基础课部分专业课的复习笔记的扫描版备份仓库☆12Jun 29, 2019Updated 6 years ago
- ☆12Sep 4, 2021Updated 4 years ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated 2 years ago
- A recommendation model kernel optimizing system☆12Jun 5, 2025Updated last year
- ☆20Dec 24, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆14Dec 13, 2022Updated 3 years ago
- 一键生成课程表ics文件, 可直接导入iOS日历☆20Mar 18, 2023Updated 3 years ago
- levelDB key/value database in Rust.☆11Nov 13, 2021Updated 4 years ago
- A Streaming-Native Serving Engine for TTS/STS Models☆68Updated this week
- ☆13May 23, 2021Updated 5 years ago
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated 2 years ago
- ☆84Jun 2, 2026Updated last week
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆1,026May 30, 2026Updated last week
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆57Apr 1, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A comprehensive tool that allows for system-level performance estimation of chiplet-based In-Memory computing (IMC) architectures.☆25Jun 27, 2024Updated last year
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Feb 11, 2025Updated last year
- SlayTheCli: A console client for the game Slay The Spire☆17Jul 12, 2020Updated 5 years ago
- Resa: Transparent Reasoning Models via SAEs☆49Sep 23, 2025Updated 8 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆15Dec 9, 2024Updated last year
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆45May 13, 2025Updated last year
- ☆22Sep 26, 2024Updated last year