zaydzuhri / flameLinks

Fork of Flame repo for training of some new stuff in development

☆18

Alternatives and similar repositories for flame

Users that are interested in flame are comparing it to the libraries listed below

Sorting:

TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 5 months ago
graphcore-research / jax-scalify
JAX Scalify: end-to-end scaled arithmetics
☆16Updated 11 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
RobertCsordas / moeut
☆86Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 4 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆44Updated last year
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆40Updated 11 months ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 4 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆38Updated 8 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
cloneofsimo / zeroshampoo
☆34Updated last year
minyoungg / LTE
☆69Updated last year
The-Inscrutable-X / TACQ
Official Repository for Task-Circuit Quantization
☆24Updated 4 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated last year
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 7 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆101Updated last month
SHI-Labs / CompactNet
☆32Updated last year
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆135Updated 3 weeks ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 5 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
kyleliang919 / Super_Muon
☆64Updated 7 months ago
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 3 months ago
ClashLuke / SOAP
☆21Updated 11 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year