FMInference / FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
☆9,187Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for FlexLLMGen
- Instruct-tune LLaMA on consumer hardware☆18,637Updated 3 months ago
- OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset☆7,383Updated last year
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆5,989Updated 2 months ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,257Updated 3 months ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,036Updated 5 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…