mrsteyk / RWKV-LM-deepspeedLinks

☆43

Alternatives and similar repositories for RWKV-LM-deepspeed

Users that are interested in RWKV-LM-deepspeed are comparing it to the libraries listed below

Sorting:

ArEnSc / Production-RWKV
This project aims to make RWKV Accessible to everyone using a Hugging Face like interface, while keeping it close to the R and D RWKV bra…
☆65Updated 2 years ago
BlinkDL / RWKV-v2-RNN-Pile
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
☆67Updated 3 years ago
harrisonvanderbyl / rwkvstic
Framework agnostic python runtime for RWKV models
☆147Updated 2 years ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
BlinkDL / WorldModel
Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…
☆40Updated 2 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
Langboat / mengzi-retrieval-lm
An experimental implementation of the retrieval-enhanced language model
☆75Updated 2 years ago
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
zsc / llama_infer
Inference script for Meta's LLaMA models using Hugging Face wrapper
☆110Updated 2 years ago
sunyt32 / torchscale
Transformers at any scale
☆42Updated last year
Doraemonzzz / nanoTransNormer
☆11Updated 2 years ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
NolanoOrg / sparse_quant_llms
SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia
☆41Updated 2 years ago
EleutherAI / stackexchange-dataset
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆85Updated 2 years ago
RyokoAI / BigKnow2022
BigKnow2022: Bringing Language Models Up to Speed
☆16Updated 2 years ago
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
LAION-AI / Anh
Anh - LAION's multilingual assistant datasets and models
☆27Updated 2 years ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
Narsil / bloomserver
☆39Updated 3 years ago
basusourya / mirostat
Code for the paper-"Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm" (https://arxiv.org/abs/2007.14966).
☆61Updated 3 years ago
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated 2 years ago
euclaise / supertrainer2000
☆50Updated last year
SeanNaren / min-LLM
Minimal code to train a Large Language Model (LLM).
☆172Updated 3 years ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
CarperAI / InstructGPT
For experiments involving instruct gpt. Currently used for documenting open research questions.
☆71Updated 3 years ago
geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆121Updated 2 years ago
huggingface / transformers_bloom_parallel
Techniques used to run BLOOM at inference in parallel
☆37Updated 3 years ago
Zyphra / Zyda_processing
☆39Updated last year
RWKV / RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
☆147Updated last year