PrimeIntellect-ai / pcclLinks

PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP

☆138

Alternatives and similar repositories for pccl

Users that are interested in pccl are comparing it to the libraries listed below

Sorting:

PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 2 weeks ago
huggingface / kernel-builder
👷 Build compute kernels
☆190Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated this week
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 8 months ago
PrimeIntellect-ai / prime-vllm
Modded vLLM to run pipeline parallelism over public networks
☆40Updated 6 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆220Updated this week
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
Noumena-Network / NSA-Test
NSA Triton Kernels written with GPT5 and Opus 4.1
☆65Updated 3 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
HazyResearch / train-tk
train with kittens!
☆63Updated last year
apple / ml-recurrent-drafter
☆219Updated 10 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
SzymonOzog / Penny
Hand-Rolled GPU communications library
☆72Updated last week
abacusai / gh200-llm
Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning
☆52Updated 9 months ago
pyember / ember
☆234Updated 5 months ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
matttreed / diloco-sim
☆21Updated 10 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated 2 months ago
PrimeIntellect-ai / smart-contracts
Solidity contracts for the decentralized Prime Network protocol
☆27Updated 4 months ago
google-deepmind / asyncdiloco
☆47Updated last year
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆110Updated 7 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
magicproduct / hash-hop
Long context evaluation for large language models
☆224Updated 9 months ago
UmerHA / triton_util
Make triton easier
☆49Updated last year
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 4 months ago