☆102Feb 11, 2026Updated 4 months ago
Alternatives and similar repositories for infllmv2_cuda_impl
Users that are interested in infllmv2_cuda_impl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆279Jul 6, 2025Updated 11 months ago
- Ongoing research project for code&math LLMs☆31Jul 4, 2025Updated 11 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆241Jan 14, 2026Updated 4 months ago
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 8 months ago
- Efficient triton implementation of Native Sparse Attention.☆279May 23, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation of APB (ACL 2025 main Oral) and Spava (ACL 2026 main).☆37Apr 6, 2026Updated 2 months ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Sequence-level 1F1B schedule for LLMs.☆19Jun 4, 2024Updated 2 years ago
- qwen-nsa☆87Oct 14, 2025Updated 7 months ago
- ☆37Aug 7, 2025Updated 10 months ago
- A Triton JIT runtime and ffi provider in C++☆35May 27, 2026Updated 2 weeks ago
- Fast and memory-efficient exact attention☆21Jun 3, 2026Updated last week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆168Oct 13, 2025Updated 7 months ago
- ☆66Apr 26, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Oct 19, 2023Updated 2 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆39Jan 20, 2026Updated 4 months ago
- ☆18Jun 3, 2024Updated 2 years ago
- Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs☆16Feb 10, 2026Updated 4 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆47Jun 11, 2025Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆345Feb 23, 2025Updated last year
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,219Apr 8, 2026Updated 2 months ago
- Heuristic filtering framework for RefineCode☆85Mar 13, 2025Updated last year
- ☆12Apr 25, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆57Sep 28, 2023Updated 2 years ago
- ☆248Nov 19, 2025Updated 6 months ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆65Mar 25, 2025Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 10 months ago
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆125May 18, 2026Updated 3 weeks ago
- ☆22Jun 5, 2025Updated last year
- ☆47Sep 15, 2025Updated 8 months ago
- ☆121May 19, 2025Updated last year
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆49Feb 19, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆12Sep 21, 2024Updated last year
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆298Dec 1, 2025Updated 6 months ago
- some minitools for linux os that are program with python☆13Jun 20, 2017Updated 8 years ago
- KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion☆12Oct 21, 2021Updated 4 years ago
- THUIR website☆10Feb 23, 2026Updated 3 months ago
- 📚 LaTeX templates and tools for creating beautiful, structured documents 📝☆14Oct 24, 2025Updated 7 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆179Aug 15, 2025Updated 9 months ago