Qcompiler / MixQ_Tensorrt_LLMLinks

Mixed precision inference by Tensorrt-LLM

☆81

Alternatives and similar repositories for MixQ_Tensorrt_LLM

Users that are interested in MixQ_Tensorrt_LLM are comparing it to the libraries listed below

Sorting:

Qcompiler / vllm-mixed-precision
Support mixed-precsion inference with vllm
☆85Updated 3 weeks ago
MrYxJ / enhance_long
This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …
☆45Updated last year
xiexi51 / ICCAD-Accel-GCN
Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"
☆52Updated 4 months ago
alienet1109 / RolePersonality
Collecting personality-indicative data for role-playing agents.
☆23Updated 5 months ago
kaizizzzzzz / Bitnet-C-benchmark
Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model
☆13Updated 8 months ago
Gunale0926 / SORSA
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
☆40Updated 5 months ago
yileijin / PayAttn
Official Implementation of "Pay Attention to What You Need"
☆43Updated 5 months ago
bird-bench / livesqlbench
☆100Updated 3 weeks ago
ByteDance-Seed / SDP4Bit
official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
☆38Updated 7 months ago
kaizizzzzzz / LLM-accelerator
☆20Updated last year
xytpai / kfunca
KFunca: A minimalist, high-performance GPU-based automatic differentiation framework
☆25Updated this week
yewentao256 / Sicpy
Typeless Programming Language `sicpy` and Compiler;
☆33Updated last year
Ablustrund / MPLSandbox
MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…
☆178Updated 3 months ago
bird-bench / BIRD-Interact
[BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.
☆111Updated 3 weeks ago
YecanLee / Adaptive-Contrastive-Search
[EMNLP 2024 Findings] Official PyTorch Implementation of "Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Ge…
☆40Updated 5 months ago
4real3000 / EasyJudge
[COLING Demos 2025] an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs
☆36Updated 5 months ago
jincan333 / MAS-TTS
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
☆76Updated 3 months ago
Ptolemy-DL / Ptolemy
☆96Updated 4 years ago
Phoenix8215 / BuildCudaNeuralNetworkFromScratch
Build CUDA Neural Network From Scratch
☆21Updated 11 months ago
Haijian06 / EartAgent
Ein multimodaler, multi-intelligenter Entwicklungsrahmen
☆45Updated 2 months ago
Ledzy / StreamBP
Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".
☆68Updated last month
UCSC-REAL / TokenCleaning
[ICML 2025] Official implementation of paper "Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning"
☆49Updated last month
JerryYin777 / NanoGPT-Pytorch2.0-Implementation
This is a repo for my NanoGPT Pytorch2.0 Implementation when torch2.0 released soon, faster and simpler, a good tutorial learning GPT.
☆52Updated last year
duguodong7 / Awesome-Knowledge-Fusion
A collection of papers related to knowledge fusion
☆57Updated 9 months ago
lxrzlyr / GAL-DAWN
GAL-DAWN: An Novel High performance computing Library of Graph Algorithms based on DAWN, CUDA/C++
☆87Updated 4 months ago
dvlab-research / Logits-Based-Finetuning
Official Code of Logits-Based-Finetuning
☆87Updated last month
zhuang-li / SCAR
[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Mode…
☆36Updated last week
Chen-speculation / auto-MCP-client
A Go library implementation of the Model Controller Protocol (MCP). This library allows developers to easily parse MCP service configurat…
☆48Updated 3 months ago
SHUMKASHUN / Plots
This repo contains my customised style python based plots for NLP papers, and includes my reproduction for my favourite papers' plots
☆39Updated last year
heng840 / AMIG
Code of Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Ne…
☆26Updated last year