sunildkumar / lora_from_scratchLinks
Implements Low-Rank Adaptation(LoRA) Finetuning from scratch
☆78Updated 2 years ago
Alternatives and similar repositories for lora_from_scratch
Users that are interested in lora_from_scratch are comparing it to the libraries listed below
Sorting:
- Implementation of the Llama architecture with RLHF + Q-learning☆166Updated 6 months ago
- A comprehensive deep dive into the world of tokens☆226Updated last year
- Collection of autoregressive model implementation☆86Updated 4 months ago
- LoRA and DoRA from Scratch Implementations☆209Updated last year
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆92Updated 2 years ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated 11 months ago
- Code repository for Black Mamba☆254Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆199Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆158Updated 2 weeks ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆116Updated 2 years ago
- An introduction to LLM Sampling☆79Updated 8 months ago
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆220Updated last year
- ☆82Updated last year
- This is the code that went into our practical dive using mamba as information extraction☆54Updated last year
- Token Omission Via Attention☆128Updated 10 months ago
- ☆40Updated last year
- Implementation of DoRA☆301Updated last year
- ☆88Updated last year
- code for training & evaluating Contextual Document Embedding models☆197Updated 3 months ago
- ☆69Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 8 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆110Updated 11 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆199Updated last year
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)☆105Updated 2 years ago
- This repository's goal is to precompile all past presentations of the Huggingface reading group☆48Updated 11 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆153Updated 2 months ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆197Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆120Updated 10 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆143Updated 3 months ago