AntNLP / nope_head_scaleLinks

☆25

Alternatives and similar repositories for nope_head_scale

Users that are interested in nope_head_scale are comparing it to the libraries listed below

Sorting:

HazyResearch / prefix-linear-attention
☆56Updated last year
chijames / KERPLE
☆19Updated 2 years ago
princeton-pli / MeCo
Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆41Updated last month
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆46Updated 2 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆58Updated 10 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 3 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆59Updated 5 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆57Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆92Updated last week
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 10 months ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆25Updated 7 months ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75Updated 2 months ago
gregorbachmann / Next-Token-Failures
☆89Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆59Updated 10 months ago
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
locuslab / scaling_laws_data_filtering
☆65Updated last year
yale-nlp / refdpo
☆16Updated last year
GAIR-NLP / AIME-Preview
☆71Updated 4 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 10 months ago
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Updated 8 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated last week
RUCAIBox / BAMBOO
☆35Updated last year
davidbrandfonbrener / color-filter-olmo
☆13Updated 3 months ago
swtheing / PF-PPO-RLHF
☆33Updated 10 months ago
GuanghaoYe / Emergence-of-Thinking
☆53Updated 5 months ago
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆49Updated 8 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆53Updated 2 years ago