RyokoAI / BigKnow2022Links

BigKnow2022: Bringing Language Models Up to Speed

☆15

Alternatives and similar repositories for BigKnow2022

Users that are interested in BigKnow2022 are comparing it to the libraries listed below

Sorting:

yikangshen / megablocks
☆20Updated last year
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Updated 8 months ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆38Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
xhan77 / in-context-alignment
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
☆35Updated 2 years ago
yangjackie / Topics-on-diffusion-generative-models
☆26Updated last month
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 10 months ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
imoneoi / bf16_fused_adam
BFloat16 Fused Adam Operator for PyTorch
☆15Updated 8 months ago
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
basusourya / mirostat
Code for the paper-"Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm" (https://arxiv.org/abs/2007.14966).
☆60Updated 3 years ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
Doraemonzzz / nanoTransNormer
☆11Updated last year
BBuf / RWKV-World-HF-Tokenizer
☆34Updated last year
NonvolatileMemory / flash_attn_gqa
triton ver of gqa flash attn, based on the tutorial
☆12Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…
☆40Updated last week
johanwind / wind_rwkv
☆24Updated last week
jungokasai / T2R
☆14Updated 2 years ago
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
Zyphra / Zyda_processing
☆37Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆46Updated last year
srush / LLM-Talk
☆51Updated last year
HazyResearch / prefix-linear-attention
☆55Updated last year