☆63May 16, 2025Updated 9 months ago
Alternatives and similar repositories for headinfer
Users that are interested in headinfer are comparing it to the libraries listed below
Sorting:
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- ☆18Jun 14, 2025Updated 8 months ago
- ☆24Jan 30, 2025Updated last year
- ☆18Mar 11, 2025Updated 11 months ago
- An experimentation platform for LLM inference optimisation☆36Sep 19, 2024Updated last year
- The code for LaRA Benchmark☆47May 28, 2025Updated 9 months ago
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆34Oct 13, 2025Updated 4 months ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated 11 months ago
- [ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms☆36Jun 4, 2025Updated 9 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models forked from deepseek-ai/Janus☆17Jan 27, 2025Updated last year
- ☆54May 19, 2025Updated 9 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- ☆20Mar 3, 2025Updated last year
- [ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.☆19Jun 2, 2024Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- ☆28Mar 24, 2025Updated 11 months ago
- ☆54Oct 29, 2024Updated last year
- ☆95Jul 7, 2025Updated 7 months ago
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 5 months ago
- ☆95Dec 6, 2024Updated last year
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- quick playground to animate pippin☆14Nov 11, 2024Updated last year
- ☆28May 24, 2025Updated 9 months ago
- ☆57Feb 10, 2025Updated last year
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- ☆31Mar 23, 2024Updated last year
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆70Jul 24, 2024Updated last year
- [AAAI 2026] The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants☆46Dec 11, 2025Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆472May 17, 2025Updated 9 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆608Nov 24, 2025Updated 3 months ago
- An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3☆29May 30, 2021Updated 4 years ago
- Official implementation of "BERTs are Generative In-Context Learners"☆32Mar 14, 2025Updated 11 months ago
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated last month
- ☆38Nov 13, 2025Updated 3 months ago
- Try out HallOumi, a state-of-the-art claim verification model in a simple UI!☆42Apr 2, 2025Updated 11 months ago
- ☆58Oct 15, 2025Updated 4 months ago
- xKV: Cross-Layer SVD for KV-Cache Compression☆45Nov 30, 2025Updated 3 months ago