lucidrains / llama-qrlhfLinks

Implementation of the Llama architecture with RLHF + Q-learning

☆166

Alternatives and similar repositories for llama-qrlhf

Users that are interested in llama-qrlhf are comparing it to the libraries listed below

Sorting:

cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆215Updated 11 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆111Updated 7 months ago
epfml / DenseFormer
☆81Updated last year
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆120Updated 9 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated 11 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
lucidrains / mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆89Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 11 months ago
llm-random / llm-random
☆192Updated last week
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆67Updated 11 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆86Updated last year
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆228Updated 11 months ago
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated 10 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆124Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
NousResearch / StripedHyenaTrainer
☆61Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
cloneofsimo / min-fsdp
☆83Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆160Updated 3 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆178Updated last month
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago