ifromeast / AI_analysis
analyse problems of AI with Math and Code
☆10Updated last week
Alternatives and similar repositories for AI_analysis:
Users that are interested in AI_analysis are comparing it to the libraries listed below
- ATC23 AE☆44Updated last year
- ☆71Updated 5 months ago
- The official code for paper "parallel speculative decoding with adaptive draft length."☆32Updated 4 months ago
- ☆40Updated last month
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆27Updated 2 months ago
- ☆51Updated 9 months ago
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆54Updated last month
- Estimate MFU for DeepSeekV3☆14Updated 2 weeks ago
- ☆59Updated last month
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆19Updated 10 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆51Updated last month
- A sparse attention kernel supporting mix sparse patterns☆94Updated 3 months ago
- Multi-Candidate Speculative Decoding☆34Updated 9 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆85Updated 3 months ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆98Updated 8 months ago
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆13Updated 3 months ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆14Updated 3 months ago
- Implement some method of LLM KV Cache Sparsity☆30Updated 7 months ago
- ☆139Updated last year
- Explore Inter-layer Expert Affinity in MoE Model Inference☆6Updated 8 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆151Updated 7 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆152Updated 6 months ago
- Official PyTorch implementation of IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact☆38Updated 7 months ago
- Sequence-level 1F1B schedule for LLMs.☆17Updated 7 months ago
- 16-fold memory access reduction with nearly no loss☆63Updated 2 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆234Updated 2 months ago
- ☆35Updated last month
- QAQ: Quality Adaptive Quantization for LLM KV Cache