wdlctc / headinferView external linksLinks
☆64May 16, 2025Updated 8 months ago
Alternatives and similar repositories for headinfer
Users that are interested in headinfer are comparing it to the libraries listed below
Sorting:
- ☆12Jan 9, 2024Updated 2 years ago
- ☆19Mar 11, 2025Updated 11 months ago
- An experimentation platform for LLM inference optimisation☆35Sep 19, 2024Updated last year
- The code for LaRA Benchmark☆47May 28, 2025Updated 8 months ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated 10 months ago
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆32Oct 13, 2025Updated 4 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- [ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms☆35Jun 4, 2025Updated 8 months ago
- ☆20Mar 3, 2025Updated 11 months ago
- ☆22Mar 7, 2025Updated 11 months ago
- ☆27Mar 24, 2025Updated 10 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- ☆94Jul 7, 2025Updated 7 months ago
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 4 months ago
- ☆96Dec 6, 2024Updated last year
- ☆28May 24, 2025Updated 8 months ago
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- quick playground to animate pippin☆14Nov 11, 2024Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆356Nov 20, 2025Updated 2 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆33Apr 5, 2025Updated 10 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- ☆31Mar 23, 2024Updated last year
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆70Jul 24, 2024Updated last year
- [AAAI 2026] The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants☆46Dec 11, 2025Updated 2 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆469May 17, 2025Updated 8 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆603Nov 24, 2025Updated 2 months ago
- Lower Precision Floating Point Operations☆66Updated this week
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated 2 weeks ago
- ☆38Nov 13, 2025Updated 3 months ago
- xKV: Cross-Layer SVD for KV-Cache Compression☆44Nov 30, 2025Updated 2 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- Build Web Datasets with Ease☆33Jun 23, 2024Updated last year
- ☆85Apr 18, 2025Updated 9 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆78Nov 25, 2024Updated last year
- Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"☆47Jul 29, 2025Updated 6 months ago
- homework in SCUT_SE☆12Nov 9, 2021Updated 4 years ago
- Memory Topology for GPUs☆17Dec 9, 2025Updated 2 months ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago