runpod-workers / worker-sglang
SGLang is fast serving framework for large language models and vision language models.
☆11Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for worker-sglang
- ☆120Updated this week
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated this week
- ☆53Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- Google TPU optimizations for transformers models☆75Updated this week
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- ☆99Updated last month
- ☆184Updated last month
- ☆43Updated 4 months ago
- Simple examples using Argilla tools to build AI☆40Updated this week
- Experiments on speculative sampling with Llama models☆118Updated last year
- ☆39Updated 10 months ago
- KV cache compression for high-throughput LLM inference☆87Updated this week
- ☆49Updated 8 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Vector Database with support for late interaction and token level embeddings.☆54Updated last month
- Curriculum training of instruction-following LLMs with Unsloth☆12Updated 3 weeks ago
- ☆72Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆61Updated 2 weeks ago
- Full finetuning of large language models without large memory requirements☆93Updated 10 months ago
- QuIP quantization☆46Updated 8 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 6 months ago
- ☆64Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆58Updated 2 weeks ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆20Updated 9 months ago
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago