modelize-ai / LLM-Inference-Deployment-Tutorial
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated last year
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆37Updated last year
- FuseAI Project☆86Updated 3 months ago
- OpenLLMDE: An open source data engineering framework for LLMs☆17Updated last year
- ☆27Updated 2 months ago
- ☆27Updated 2 weeks ago
- Manages vllm-nccl dependency☆17Updated 11 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 11 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆40Updated 5 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 11 months ago
- Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in datase…☆53Updated last year
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Updated 8 months ago
- ☆19Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ☆36Updated 8 months ago
- Fast instruction tuning with Llama2☆11Updated last year
- survery of small language models☆15Updated 9 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆95Updated last year
- ☆20Updated 6 months ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆61Updated 2 months ago
- Open efforts to implement ChatGPT-like models and beyond.☆107Updated 9 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated 10 months ago
- A collection of reproducible inference engine benchmarks☆30Updated 3 weeks ago
- ☆29Updated 8 months ago
- LMTuner: Make the LLM Better for Everyone☆35Updated last year
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆31Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆72Updated last year