runpod-workers / worker-sglangLinks

SGLang is fast serving framework for large language models and vision language models.

☆24

Alternatives and similar repositories for worker-sglang

Users that are interested in worker-sglang are comparing it to the libraries listed below

Sorting:

huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆117Updated 6 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆82Updated 2 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
nyunAI / PruneGPT
☆51Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆76Updated this week
QuixiAI / spectrum
☆129Updated 3 months ago
QuixiAI / grokadamw
☆134Updated 11 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 6 months ago
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆34Updated this week
reka-ai / rekaquant
☆57Updated 3 weeks ago
hamelsmu / llama-inference
experiments with inference on llama
☆104Updated last year
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆61Updated 2 months ago
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆119Updated last year
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆137Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 9 months ago
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆94Updated last year
aniketmaurya / fastserve-ai
Machine Learning Serving focused on GenAI with simplicity as the top priority.
☆59Updated 3 weeks ago
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆153Updated 2 weeks ago
facebookresearch / fastgen
Simple high-throughput inference library
☆125Updated 2 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆87Updated this week
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆218Updated this week
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆175Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
character-ai / pipelining-sft
Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings
☆68Updated last week
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year