HaoKang-Timmy / LatencySensitiveBenchView external linksLinks
First Latency-Aware Competitive LLM Agent Benchmark
☆26Jun 3, 2025Updated 8 months ago
Alternatives and similar repositories for LatencySensitiveBench
Users that are interested in LatencySensitiveBench are comparing it to the libraries listed below
Sorting:
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 7 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 9 months ago
- ☆35Dec 22, 2025Updated last month
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- raytracer☆10Jul 18, 2022Updated 3 years ago
- ☆10Apr 24, 2024Updated last year
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆25Jun 16, 2025Updated 7 months ago
- [ICML 2025] Official PyTorch implementation of "NegMerge: Sign-Consensual Weight Merging for Machine Unlearning"☆14Nov 25, 2025Updated 2 months ago
- [CVPR 2025] QuartDepth☆16Mar 24, 2025Updated 10 months ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 7 months ago
- Source code of our TNNLS paper "Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution"☆12Apr 14, 2023Updated 2 years ago
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆21May 23, 2025Updated 8 months ago
- ☆11Apr 5, 2023Updated 2 years ago
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 2 years ago
- EdgeRag is a program that runs large language models and vector databases on your local device☆14May 29, 2024Updated last year
- ☆13Jul 14, 2025Updated 7 months ago
- Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search☆10Mar 12, 2024Updated last year
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 6 months ago
- A curated list for Efficient Large Language Models☆11Mar 25, 2024Updated last year
- A reading group for system verification papers☆10Sep 28, 2023Updated 2 years ago
- ☆15Jan 12, 2026Updated last month
- My solution code to parallel architecture and programming Spring 2016☆12Aug 15, 2016Updated 9 years ago
- IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)☆14Jul 14, 2025Updated 7 months ago
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- ☆12Aug 18, 2023Updated 2 years ago
- ☆17Dec 16, 2025Updated last month
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆14Jun 26, 2025Updated 7 months ago
- ☆54Oct 8, 2024Updated last year
- ☆13Jul 25, 2024Updated last year
- Official PyTorch implementation of the paper entitled 'Self Attentive Pooling for Efficient Deep Learning'.☆13May 3, 2024Updated last year
- Luminance-Contrast-Aware Foveated Rendering☆13Jun 19, 2025Updated 7 months ago
- File System in User Space☆13Oct 31, 2019Updated 6 years ago
- ☆11Oct 27, 2022Updated 3 years ago
- ☆10Sep 23, 2025Updated 4 months ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 7 months ago
- ☆17Mar 10, 2025Updated 11 months ago