zhenyuhe00 / BiPELinks

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024

☆22

Alternatives and similar repositories for BiPE

Users that are interested in BiPE are comparing it to the libraries listed below

Sorting:

yule-BUAA / MergeLLM
Codes for Merging Large Language Models
☆33Updated last year
QizhiPei / MathFusion
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆32Updated 4 months ago
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆39Updated last year
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆105Updated 2 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆89Updated last year
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆69Updated 4 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆87Updated last week
ShadeCloak / ADORA
☆46Updated 7 months ago
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆80Updated 5 months ago
uservan / ThinkPO
☆17Updated 3 months ago
ECNU-ICALK / MELO
[AAAI 2024] MELO: Enhancing Model Editing with Neuron-indexed Dynamic LoRA
☆25Updated last year
abdelfattah-lab / TokenButler
☆26Updated last week
Kwai-Klear / RLEP
RL with Experience Replay
☆48Updated 3 months ago
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆73Updated 11 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆105Updated last month
RUCBM / DeepCritic
Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"
☆41Updated 4 months ago
GraphPKU / Case_or_Rule
exploring whether LLMs perform case-based or rule-based reasoning
☆30Updated last year
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆40Updated last year
OpenSparseLLMs / MoM
☆106Updated 2 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆48Updated last year
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆88Updated 11 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆135Updated 4 months ago
osehmathias / lisa
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆36Updated last year
guyuntian / CoT_benchmark
Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"
☆20Updated 2 years ago
shawnricecake / Heima
Code for Heima
☆58Updated 7 months ago
LiangrunFlora / Slow-Fast-Sampling
Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Princ…
☆34Updated 4 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
Chaos96 / fourierft
☆148Updated last year