skypilot-org / skypilot-catalog
☆18Updated this week
Related projects: ⓘ
- Tutorial to get started with SkyPilot!☆54Updated 4 months ago
- ☆26Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆58Updated this week
- ☆35Updated 2 months ago
- ☆30Updated 2 years ago
- ☆130Updated this week
- ☆32Updated this week
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆57Updated 11 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- A minimal implementation of vllm.☆29Updated last month
- Benchmark suite for LLMs from Fireworks.ai☆51Updated this week
- How much energy do LLMs consume?☆40Updated last week
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆50Updated last year
- Vector Database with support for late interaction and token level embeddings.☆51Updated last week
- Stateful LLM Serving☆25Updated last month
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆49Updated 3 weeks ago
- Distributed ML Optimizer☆31Updated 3 years ago
- Self-host LLMs with vLLM and BentoML☆62Updated this week
- ☆24Updated last year
- Modular and structured prompt caching for low-latency LLM inference☆43Updated 4 months ago
- ☆77Updated this week
- ☆17Updated last year
- Collection of kernels written in Triton language☆48Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆57Updated 5 months ago
- ☆61Updated 3 weeks ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- Make triton easier☆39Updated 3 months ago
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆23Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆88Updated 11 months ago
- Google TPU optimizations for transformers models☆62Updated this week