Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆77Aug 12, 2025Updated 8 months ago
Alternatives and similar repositories for GPUs-Specs
Users that are interested in GPUs-Specs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)☆12Updated this week
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 3 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated 2 years ago
- repository for the MICCAI 2022 AutoPET challenge☆14Sep 19, 2022Updated 3 years ago
- Distributed Compiler based on Triton for Parallel Systems☆1,421Apr 22, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆16Jul 28, 2021Updated 4 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆33Apr 9, 2023Updated 3 years ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆56Mar 5, 2025Updated last year
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆282Mar 6, 2025Updated last year
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,297Aug 28, 2025Updated 8 months ago
- 使用 python、Latex 绘制神经网络架构图。☆20Jun 2, 2022Updated 3 years ago
- ☆362Jan 28, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Jan 21, 2021Updated 5 years ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆85Dec 18, 2025Updated 4 months ago
- Separate from hardware and used to learn some NCCL mechanisms☆26Apr 19, 2024Updated 2 years ago
- A variant of Ahash written in C++.☆10Mar 20, 2023Updated 3 years ago
- My learning notes about AI, including Machine Learning and Deep Learning.☆18Jun 30, 2019Updated 6 years ago
- Implementation of the Modbus protocol in .NET; containing ASCII, RTU and TCP.☆10Jan 12, 2026Updated 3 months ago
- ☆12May 23, 2018Updated 7 years ago
- Materials for learning SGLang☆808Jan 5, 2026Updated 4 months ago
- Implementation for IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Feb 22, 2026Updated 2 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Evaluation for 3D reconstruction, includes monocular depth, video depth, relative camera pose & multi-view point map estimation.☆20Aug 26, 2025Updated 8 months ago
- ☆19Feb 28, 2022Updated 4 years ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆32Apr 1, 2026Updated last month
- ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis☆33Feb 20, 2024Updated 2 years ago
- Research about dataflow architecture☆12Nov 30, 2023Updated 2 years ago
- ☆13Jan 23, 2021Updated 5 years ago
- [DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning☆15Jan 13, 2024Updated 2 years ago
- Cycle-accurate C++ & SystemC simulator for the RISC-V GPGPU Ventus☆33Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆76Jan 8, 2025Updated last year
- An LLM inference engine, written in C++☆19Mar 30, 2026Updated last month
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆483Updated this week
- Asynchronous pipeline parallel optimization☆21Feb 2, 2026Updated 3 months ago
- Shadowsocks/ShadowsocksR 账号在线监控☆12Nov 25, 2018Updated 7 years ago
- A benchmark suite for evaluating FaaS scheduler.☆23Nov 5, 2022Updated 3 years ago
- ☆20Aug 26, 2021Updated 4 years ago