CyndxAI/QKNorm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CyndxAI/QKNorm)

CyndxAI / QKNorm

Code for the paper "Query-Key Normalization for Transformers"

☆53

Alternatives and similar repositories for QKNorm

Users that are interested in QKNorm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
AranKomat / Metroplex
View on GitHub
☆21Mar 15, 2023Updated 3 years ago
RobertCsordas / ndr
View on GitHub
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆34Jun 11, 2025Updated last year
AntNLP / nope_head_scale
View on GitHub
☆29May 4, 2024Updated 2 years ago
belindal / TaskBench500
View on GitHub
Suite of 500 procedurally-generated NLP tasks to study language model adaptability
☆21Jul 16, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NonvolatileMemory / flash_attn_gqa
View on GitHub
triton ver of gqa flash attn, based on the tutorial
☆12Aug 4, 2024Updated last year
cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
photogeniq / image-encoders
View on GitHub
🖼️📊
☆11Jun 9, 2020Updated 6 years ago
microsoft / AMOS
View on GitHub
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
☆26Jul 26, 2023Updated 2 years ago
ltgoslo / factorizer
View on GitHub
☆16May 14, 2024Updated 2 years ago
Edward-Sun / structured-nart
View on GitHub
☆15Dec 5, 2019Updated 6 years ago
joeljang / FLM
View on GitHub
All-in-one repository for Fine-tuning & Pretraining (Large) Language Models
☆15Mar 8, 2023Updated 3 years ago
LZhengisme / CODA
View on GitHub
Implementation of Cascaded Head-colliding Attention (ACL'2021)
☆11Sep 16, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SchulzLab / SNEEP
View on GitHub
SNp Exploration and Analysis using EPigenomics data
☆12Jan 14, 2025Updated last year
NingAnMe / Label-Smoothing-for-CrossEntropyLoss-PyTorch
View on GitHub
add a Arg: label_smoothing for torch.nn.CrossEntropyLoss()
☆14Jan 13, 2021Updated 5 years ago
LIANGQINGYUAN / Lyra
View on GitHub
Lyra: A Benchmark for Turducken-Style Code Generation
☆15Apr 22, 2022Updated 4 years ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
joeljang / ELM
View on GitHub
[ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning
☆99Apr 26, 2023Updated 3 years ago
lsj2408 / URPE
View on GitHub
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆35Aug 6, 2023Updated 2 years ago
OpenNLPLab / LASP
View on GitHub
Linear Attention Sequence Parallelism (LASP)
☆87Jun 4, 2024Updated 2 years ago
teffland / ner-expected-entity-ratio
View on GitHub
Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022
☆14Nov 7, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yoshavit / fairml-farm
View on GitHub
A collection of implementations of fair ML algorithms
☆12Jan 7, 2018Updated 8 years ago
deep-spin / infinite-former
View on GitHub
☆68Aug 29, 2024Updated last year
ucker / why-low-precision-training-fails
View on GitHub
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
☆72Apr 7, 2026Updated 3 months ago
RobertCsordas / moe_layer
View on GitHub
sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
j-towns / vdvae-jax
View on GitHub
Very deep VAEs in JAX/Flax
☆47Jun 16, 2021Updated 5 years ago
haitian-sun / ConditionalQA
View on GitHub
Release of the ConditionalQA dataset
☆21Nov 2, 2021Updated 4 years ago
Kajiyu / kanerva_machine
View on GitHub
The implementation of "The Kanerva Machine" with Pytorch and Pyro
☆12Jun 14, 2018Updated 8 years ago
LouChao98 / nner_as_parsing
View on GitHub
☆16Mar 22, 2023Updated 3 years ago
ICLR-DAP / Deep-Audio-Prior
View on GitHub
Anonymous ICLR Submission
☆14Sep 25, 2019Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
alec-tschantz / planet
View on GitHub
PlaNet: Learning Latent Dynamics for Planning from Pixels
☆10Feb 13, 2020Updated 6 years ago
RSNA / AI-Deep-Learning-Lab-2024
View on GitHub
Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2024
☆14Dec 4, 2024Updated last year
iPieter / llmq
View on GitHub
A Scheduler for Batched LLM Inference
☆19Oct 5, 2025Updated 9 months ago
zorazrw / odex
View on GitHub
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆49Dec 22, 2023Updated 2 years ago
JSchlensok / VespaG
View on GitHub
Expert-Guided Protein Language Models enable Accurate and Blazingly Fast Fitness Prediction
☆16Feb 6, 2026Updated 5 months ago
lucidrains / token-shift-gpt
View on GitHub
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆49Jan 27, 2022Updated 4 years ago
nshepperd / jaxtorch
View on GitHub
A JAX nn library
☆21Sep 9, 2025Updated 10 months ago