[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆50Jun 17, 2025Updated 9 months ago
Alternatives and similar repositories for DeFT
Users that are interested in DeFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Legacy Code of ZJU Campus App for iOS☆11Jan 31, 2024Updated 2 years ago
- Collect papers related to personalized text generation☆18Sep 6, 2021Updated 4 years ago
- [NeurIPS 2024] Activating Self-Attention for Multi-Scene Absolute Pose Regression☆14Feb 24, 2025Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated 11 months ago
- Course website for Systems Verification Fall 2024☆14Jul 10, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆38Aug 7, 2025Updated 8 months ago
- ☆13Sep 2, 2023Updated 2 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- [ACL'25 Main] Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆41May 26, 2025Updated 10 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆54Jul 15, 2025Updated 8 months ago
- ☆239Nov 19, 2025Updated 4 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆134Nov 10, 2025Updated 5 months ago
- ☆14Mar 18, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Time-R1: Framework and resources for endowing LLMs with comprehensive temporal reasoning (understanding, prediction, creative generation)…☆72Jun 11, 2025Updated 10 months ago
- GenDB, an LLM-Powered Generative Query Engine Built for the Future☆54Mar 27, 2026Updated 2 weeks ago
- This project is my attempt at automating work in Notion.☆17Aug 28, 2025Updated 7 months ago
- Python Script to Open SJTU Dormitory Smart Lock☆10Sep 12, 2022Updated 3 years ago
- Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation☆17Apr 3, 2024Updated 2 years ago
- 这是一个大学四年的cs基础课部分专业课的复习笔记的扫描版备份仓库☆12Jun 29, 2019Updated 6 years ago
- A recommendation model kernel optimizing system☆12Jun 5, 2025Updated 10 months ago
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆31Mar 9, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆19Dec 24, 2024Updated last year
- 一键生成课程表ics文件, 可直接导入iOS日历☆20Mar 18, 2023Updated 3 years ago
- ☆21Mar 29, 2026Updated last week
- levelDB key/value database in Rust.☆11Nov 13, 2021Updated 4 years ago
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller☆50Aug 5, 2025Updated 8 months ago
- ☆13May 23, 2021Updated 4 years ago
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated last year
- ☆84Nov 10, 2025Updated 5 months ago
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆56Apr 1, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆25Oct 9, 2025Updated 6 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆14Dec 9, 2024Updated last year
- Resa: Transparent Reasoning Models via SAEs☆48Sep 23, 2025Updated 6 months ago
- Github repository for CLAPACK (fork of CLAPACK 3.2.1 patched for our needs)☆10Aug 15, 2018Updated 7 years ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆44May 13, 2025Updated 10 months ago
- SlayTheCli: A console client for the game Slay The Spire☆17Jul 12, 2020Updated 5 years ago
- ☆22Sep 26, 2024Updated last year