chu-tianxiang / vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
☆130Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for vllm-gptq
- A high-throughput and memory-efficient inference and serving engine for LLMs☆123Updated 11 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆96Updated 8 months ago
- Imitate OpenAI with Local Models☆85Updated 2 months ago
- The official codes for "Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning"☆257Updated 6 months ago
- 旨在对当前主流LLM进行一个直观、具体、标准的评测☆92Updated last year
- Mixture-of-Experts (MoE) Language Model☆180Updated 2 months ago
- llama inference for tencentpretrain☆96Updated last year
- Implement OpenAI APIs and plugin-enabled ChatGPT with open source LLM and other models.☆122Updated 5 months ago
- deep learning☆149Updated 4 months ago
- 使用qlora对中文大语言模型进行微调,包含ChatGLM、Chinese-LLaMA-Alpaca、BELLE☆85Updated last year
- Generate multi-round conversation roleplay data based on self-instruct and evol-instruct.☆111Updated last week
- Official repository for LongChat and LongEval☆512Updated 5 months ago
- ☆120Updated 11 months ago
- Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.☆50Updated last year
- Light local website for displaying performances from different chat models.☆85Updated last year
- LongQLoRA: Extent Context Length of LLMs Efficiently☆159Updated last year
- Open efforts to implement ChatGPT-like models and beyond.☆105Updated 3 months ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆142Updated 9 months ago
- ☆81Updated 6 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆124Updated 4 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.☆149Updated last month
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆137Updated 2 months ago
- zero零训练llm调参☆30Updated last year
- A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters…☆165Updated 5 months ago
- ☆173Updated last year
- XVERSE-65B: A multilingual large language model developed by XVERSE Technology Inc.☆132Updated 7 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆131Updated 4 months ago
- SUS-Chat: Instruction tuning done right☆47Updated 10 months ago
- 大语言模型指令调优工具(支持 FlashAttention)☆166Updated 10 months ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆106Updated last year