modelize-ai / LLM-Inference-Deployment-Tutorial
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated last year
Related projects: ⓘ
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆32Updated 8 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆29Updated 6 months ago
- ☆57Updated 3 weeks ago
- FuseAI Project☆75Updated last month
- OpenLLMDE: An open source data engineering framework for LLMs☆16Updated last year
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆15Updated 3 weeks ago
- An Experiment on Dynamic NTK Scaling RoPE☆59Updated 9 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆48Updated last week
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆65Updated 6 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆68Updated 11 months ago
- ☆34Updated 2 weeks ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆24Updated 2 months ago
- ☆60Updated 5 months ago
- A light proxy solution for HuggingFace hub.☆43Updated 10 months ago
- Token level visualization tools for large language models☆46Updated last month
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆121Updated 3 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆114Updated 2 months ago
- ☆45Updated 7 months ago
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆27Updated 2 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting☆60Updated 6 months ago
- ☆22Updated 3 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆51Updated 5 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆50Updated 3 months ago
- ☆87Updated 4 months ago
- LMTuner: Make the LLM Better for Everyone☆33Updated last year
- Reformatted Alignment☆111Updated 4 months ago
- The Efficiency Spectrum of LLM☆50Updated 9 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆44Updated 2 months ago
- Manages vllm-nccl dependency☆17Updated 3 months ago