skypilot-org / skypilot-catalogLinks
☆27Updated this week
Alternatives and similar repositories for skypilot-catalog
Users that are interested in skypilot-catalog are comparing it to the libraries listed below
Sorting:
- Cray-LM unified training and inference stack.☆22Updated last year
- ☆47Updated 2 years ago
- LM engine is a library for pretraining/finetuning LLMs☆113Updated this week
- A collection of reproducible inference engine benchmarks☆38Updated 9 months ago
- AI-Driven Research Systems (ADRS)☆117Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated last week
- vLLM adapter for a TGIS-compatible gRPC server.☆50Updated this week
- Tutorial to get started with SkyPilot!☆58Updated last year
- Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings☆114Updated 6 months ago
- Memory optimized Mixture of Experts☆72Updated 6 months ago
- Benchmark suite for LLMs from Fireworks.ai☆86Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- Storing long contexts in tiny caches with self-study☆231Updated last month
- Easy, Fast, and Scalable Multimodal AI☆106Updated this week
- ☆237Updated 3 weeks ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- Manage ML configuration with pydantic☆16Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆205Updated last week
- Google TPU optimizations for transformers models☆133Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆137Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 3 months ago
- Make triton easier☆50Updated last year
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆149Updated 2 years ago
- 👷 Build compute kernels☆214Updated last week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago
- LLM Serving Performance Evaluation Harness☆83Updated 11 months ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆259Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆474Updated 2 weeks ago
- Repository for CPU Kernel Generation for LLM Inference☆27Updated 2 years ago