runpod-workers / worker-sglang
SGLang is fast serving framework for large language models and vision language models.
☆22Updated 2 months ago
Alternatives and similar repositories for worker-sglang:
Users that are interested in worker-sglang are comparing it to the libraries listed below
- ☆53Updated 11 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated last week
- ☆73Updated last year
- ☆33Updated 10 months ago
- Data preparation code for Amber 7B LLM☆89Updated 11 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 11 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Google TPU optimizations for transformers models☆109Updated 3 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated last month
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆31Updated this week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆83Updated last month
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 4 months ago
- ☆66Updated 11 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- QLoRA with Enhanced Multi GPU Support☆37Updated last year
- RWKV-7: Surpassing GPT☆84Updated 5 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆27Updated this week
- ☆75Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆60Updated 8 months ago
- A pipeline for LLM knowledge distillation☆101Updated last month
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 11 months ago
- Self-host LLMs with LMDeploy and BentoML☆18Updated last month
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 4 months ago
- ☆28Updated 5 months ago
- Collection of autoregressive model implementation☆85Updated 2 weeks ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago
- ☆117Updated last month
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 4 months ago