Cuda kernels for leveraging LLM sparsity to improve throughput and decrease the memory requirements during inference and training.
☆245May 14, 2026Updated last month
Alternatives and similar repositories for sparser-faster-llms
Users that are interested in sparser-faster-llms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Sep 22, 2024Updated last year
- TensorRT for RefineNet Segmentation☆12Apr 27, 2021Updated 5 years ago
- ☆38Nov 19, 2025Updated 7 months ago
- A practical way of learning Swizzle☆42Feb 3, 2025Updated last year
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆19Oct 21, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆19Apr 16, 2025Updated last year
- ☆13May 4, 2026Updated last month
- Recursive Self-Aggregation evals on ARC-AGI☆36Jan 26, 2026Updated 5 months ago
- ☆19Aug 23, 2025Updated 10 months ago
- awesome LLM papers! 🚀 🚀 🚀☆45Jul 3, 2025Updated 11 months ago
- A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.☆315Updated this week
- vTPM with SGX protection☆12May 30, 2019Updated 7 years ago
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆35Mar 26, 2026Updated 3 months ago
- ☆19Mar 12, 2026Updated 3 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆11May 16, 2026Updated last month
- Metadata Editor user and practice guide☆19May 8, 2026Updated last month
- Explore training for quantized models☆26Jul 12, 2025Updated 11 months ago
- A curated reading list of research in Sparse Autoencoders, Feature Extraction and related topics in Mechanistic Interpretability☆32Jan 30, 2025Updated last year
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆33May 26, 2026Updated last month
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 10 months ago
- ☆14Nov 3, 2025Updated 7 months ago
- Multi Face Recognition and Detection☆68Nov 1, 2022Updated 3 years ago
- Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction☆356Jun 1, 2026Updated 3 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 《汇编语言一发入魂》配套代码☆15May 30, 2020Updated 6 years ago
- ☆44Jan 16, 2026Updated 5 months ago
- A lightweight, self-hosted infrastructure layer for deploying and managing LLM agents as resilient microservices. Features automatic r…☆18Aug 4, 2025Updated 10 months ago
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆27Jan 28, 2024Updated 2 years ago
- DoubleAI’s hyperoptimised version of cuGraph☆60Mar 3, 2026Updated 3 months ago
- ☆18Nov 22, 2025Updated 7 months ago
- 。☆13Jan 15, 2022Updated 4 years ago
- Interact with various LLMs in your browser (LangChain.js, Angular)☆17May 7, 2026Updated last month
- [ICML 2024] Outlier-Efficient Hopfield Layers for Large Transformer-Based Models☆23Mar 30, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆32Jul 2, 2025Updated 11 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆105Apr 7, 2026Updated 2 months ago
- ☆54Mar 15, 2025Updated last year
- Unified framework for robot learning built on NVIDIA Isaac Sim☆19Sep 22, 2024Updated last year
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆51Feb 26, 2026Updated 4 months ago
- Fast GPU based tensor core reductions☆12Jan 13, 2023Updated 3 years ago
- C/C++ Implementation of the HyperLogLog++ cardinality estimation algorithm.☆29May 20, 2023Updated 3 years ago