BlinkDL / RWKV-v2-RNN-PileLinks

RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

☆66

Alternatives and similar repositories for RWKV-v2-RNN-Pile

Users that are interested in RWKV-v2-RNN-Pile are comparing it to the libraries listed below

Sorting:

ArEnSc / Production-RWKV
This project aims to make RWKV Accessible to everyone using a Hugging Face like interface, while keeping it close to the R and D RWKV bra…
☆64Updated 2 years ago
harrisonvanderbyl / rwkvstic
Framework agnostic python runtime for RWKV models
☆146Updated 2 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆115Updated 2 years ago
zphang / minimal-gpt-neox-20b
☆131Updated 3 years ago
AeroScripts / HiddenEngrams
Hidden Engrams: Long Term Memory for Transformer Model Inference
☆35Updated 4 years ago
BlinkDL / WorldModel
Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…
☆39Updated 2 years ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆62Updated 2 years ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
gustavecortal / gpt-j-fine-tuning-example
Fine-tuning 6-Billion GPT-J (& other models) with LoRA and 8-bit compression
☆68Updated 3 years ago
patil-suraj / stable-diffusion-jax
☆89Updated 3 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆188Updated 3 years ago
mrsteyk / RWKV-LM-deepspeed
☆42Updated 2 years ago
zsc / llama_infer
Inference script for Meta's LLaMA models using Hugging Face wrapper
☆109Updated 2 years ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆34Updated 2 years ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆222Updated last year
LAION-AI / Open-GIA
O-GIA is an umbrella for research, infrastructure and projects ecosystem that should provide open source, reproducible datasets, models, …
☆89Updated 2 years ago
crowsonkb / cloob-training
CLOOB training (JAX) and inference (JAX and PyTorch)
☆74Updated 3 years ago
BlinkDL / minGPT-tuned
A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
☆118Updated 4 years ago
EleutherAI / magiCARP
One stop shop for all things carp
☆59Updated 3 years ago
da03 / markup2im
Diffusion-based markup-to-image generation
☆83Updated 2 years ago
patil-suraj / vqgan-jax
JAX implementation of VQGAN
☆91Updated 3 years ago
EleutherAI / openwebtext2
☆91Updated 3 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆69Updated 2 years ago
afiaka87 / latent-diffusion-deepspeed
Finetune the 1.4B latent diffusion text2img-large checkpoint from CompVis using deepspeed. (work-in-progress)
☆36Updated 3 years ago
zphang / minimal-opt
☆67Updated 3 years ago
deep-spin / infinite-former
☆67Updated last year
robvanvolt / DALLE-datasets
This is a summary of easily available datasets for generalized DALLE-pytorch training.
☆128Updated 3 years ago
Dahoas / reward-modeling
☆98Updated 2 years ago
basusourya / mirostat
Code for the paper-"Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm" (https://arxiv.org/abs/2007.14966).
☆61Updated 3 years ago