SJTU-IPADS / SmallThinkerLinks
☆46Updated 4 months ago
Alternatives and similar repositories for SmallThinker
Users that are interested in SmallThinker are comparing it to the libraries listed below
Sorting:
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆205Updated last month
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆266Updated this week
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆327Updated this week
- ☆29Updated 5 months ago
- Fairy±i (iFairy): Complex-valued Quantization Framework for Large Language Models☆106Updated last week
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆90Updated 5 months ago
- ☆152Updated 5 months ago
- ☆111Updated 6 months ago
- ☆73Updated 6 months ago
- ☆85Updated 7 months ago
- ☆46Updated 7 months ago
- ☆60Updated 6 months ago
- ☆439Updated 3 months ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆455Updated this week
- An early research stage MoE load balancer based on inear programming.☆415Updated last week
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆89Updated this week
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆53Updated last month
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Updated last year
- 青稞Talk☆168Updated this week
- ☆76Updated last year
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆363Updated 2 weeks ago
- High-speed and easy-use LLM serving framework for local deployment☆137Updated 3 months ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆83Updated 3 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆82Updated this week
- DeepSeek Native Sparse Attention pytorch implementation☆108Updated 3 weeks ago
- ☆51Updated 6 months ago
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆63Updated last month
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆102Updated 2 weeks ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆64Updated last year
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆151Updated this week