[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆51Jun 17, 2025Updated 11 months ago
Alternatives and similar repositories for DeFT
Users that are interested in DeFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Legacy Code of ZJU Campus App for iOS☆11Jan 31, 2024Updated 2 years ago
- ☆19Aug 10, 2024Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated last year
- Course website for Systems Verification Fall 2024☆14Jul 10, 2025Updated 10 months ago
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ACL'25 Main] Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆42May 26, 2025Updated 11 months ago
- MMoE: Multimodal Mixture-of-Experts (EMNLP 2024)☆16Nov 14, 2024Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆54Jul 15, 2025Updated 10 months ago
- Efficient GPU communication over multiple NICs.☆28Nov 20, 2025Updated 6 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆137Nov 10, 2025Updated 6 months ago
- ☆248Nov 19, 2025Updated 6 months ago
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆36Apr 29, 2026Updated 3 weeks ago
- A case for representing data collections and objects in the LLVM IR☆25Jan 29, 2026Updated 3 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Python Script to Open SJTU Dormitory Smart Lock☆10Sep 12, 2022Updated 3 years ago
- GenDB, an LLM-Powered Generative Query Engine Built for the Future☆58Apr 10, 2026Updated last month
- ☆12Sep 4, 2021Updated 4 years ago
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆32Mar 9, 2025Updated last year
- ☆20Dec 24, 2024Updated last year
- ☆14Dec 13, 2022Updated 3 years ago
- 一键生成课程表ics文件, 可直接导入iOS日历☆20Mar 18, 2023Updated 3 years ago
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller☆50Aug 5, 2025Updated 9 months ago
- ☆13May 23, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆29Aug 30, 2024Updated last year
- ☆84Nov 10, 2025Updated 6 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆973Updated this week
- [CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning☆57Apr 1, 2025Updated last year
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Feb 11, 2025Updated last year
- Resa: Transparent Reasoning Models via SAEs☆49Sep 23, 2025Updated 7 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆15Dec 9, 2024Updated last year
- Github repository for CLAPACK (fork of CLAPACK 3.2.1 patched for our needs)☆10Aug 15, 2018Updated 7 years ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆45May 13, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- SlayTheCli: A console client for the game Slay The Spire☆17Jul 12, 2020Updated 5 years ago
- ☆22Sep 26, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 9 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆297May 1, 2025Updated last year
- SQLBench Runners☆13Dec 17, 2023Updated 2 years ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated last year
- A gitbook named studying-containerd-notes☆10Dec 17, 2018Updated 7 years ago