Dao-AILab / flash-attention
Fast and memory-efficient exact attention
β13,401Updated this week
Related projects: β
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β15,839Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β7,687Updated this week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β10,327Updated last month
- Train transformer language models with reinforcement learning.β9,288Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β6,029Updated this week
- Ongoing research training transformer models at scaleβ9,949Updated this week
- Transformer related optimization, including BERT, GPTβ5,773Updated 5 months ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.β8,351Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMsβ9,906Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ26,822Updated this week
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the bestβ¦β12,397Updated 2 weeks ago
- A framework for few-shot evaluation of language models.β6,426Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ19,545Updated 3 weeks ago
- Mamba SSM architectureβ12,542Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ9,663Updated 3 weeks ago
- Latest Advances on Multimodal Large Language Modelsβ11,722Updated this week
- An open source implementation of CLIP.β9,782Updated last month
- Large Language Model Text Generation Inferenceβ8,762Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β19,294Updated last month
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adβ¦β5,958Updated last week
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that containβ¦β8,186Updated last week
- Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom dataseβ¦β11,582Updated last week
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersβ5,691Updated 6 months ago
- Development repository for the Triton language and compilerβ12,698Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ6,537Updated 2 months ago
- ImageBind One Embedding Space to Bind Them Allβ8,221Updated last month
- β4,006Updated 3 months ago
- SGLang is a fast serving framework for large language models and vision language models.β5,121Updated this week
- Example models using DeepSpeedβ5,987Updated this week
- A collection of libraries to optimise AI model performancesβ8,373Updated last month