keezen/ntk_alibi

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/keezen/ntk_alibi)

keezen / ntk_alibi

NTK scaled version of ALiBi position encoding in Transformer.

☆69

Alternatives and similar repositories for ntk_alibi

Users that are interested in ntk_alibi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

seanzhang-zhichen / baichuan-Dynamic-NTK-ALiBi
View on GitHub
百川Dynamic NTK-ALiBi的代码实现：无需微调即可推理更长文本
☆49Aug 27, 2023Updated 2 years ago
genggui001 / Megatron-DeepSpeed-Llama
View on GitHub
☆84Sep 9, 2023Updated 2 years ago
NormXU / Consistent-DynamicNTKRoPE
View on GitHub
An Experiment on Dynamic NTK Scaling RoPE
☆65Nov 26, 2023Updated 2 years ago
OpenLMLab / scaling-rope
View on GitHub
code for Scaling Laws of RoPE-based Extrapolation
☆73Oct 16, 2023Updated 2 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Prakhar-97 / Table-detection-and-Document-layout-analysis
View on GitHub
☆10Jun 22, 2020Updated 6 years ago
ppfliu / emotion-recognition
View on GitHub
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
☆15May 10, 2022Updated 4 years ago
yangjianxin1 / LLMPruner
View on GitHub
☆309Apr 6, 2023Updated 3 years ago
LydiaXiaohongLi / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆19Jul 20, 2023Updated 3 years ago
bugensui / WenTianSearch
View on GitHub
“阿里灵杰”问天引擎电商搜索算法赛 13/2771
☆10Jul 31, 2022Updated 3 years ago
jquesnelle / yarn
View on GitHub
YaRN: Efficient Context Window Extension of Large Language Models
☆1,744Apr 17, 2024Updated 2 years ago
Zacchaeus00 / nbme
View on GitHub
https://www.kaggle.com/c/nbme-score-clinical-patient-notes
☆10Sep 1, 2022Updated 3 years ago
nick7nlp / Jamba_Paper_Reading
View on GitHub
Paper reading: Jamba — Hybrid Transformer-Mamba LM (SSM → S4 → S6 → Jamba)
☆15May 22, 2024Updated 2 years ago
nick7nlp / Counting-Stars
View on GitHub
Counting-Stars (★)
☆83Nov 24, 2025Updated 8 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
thunlp / WebCPM
View on GitHub
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
☆911Nov 25, 2023Updated 2 years ago
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,426Mar 3, 2024Updated 2 years ago
CLUEbenchmark / pCLUE
View on GitHub
pCLUE: 1000000+多任务提示学习数据集
☆509Oct 4, 2022Updated 3 years ago
OpenLMLab / LongWanjuan
View on GitHub
Towards Systematic Measurement for Long Text Quality
☆39Sep 5, 2024Updated last year
wangguojim / LargeScale
View on GitHub
☆19May 11, 2024Updated 2 years ago
limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
stack-heap-overflow / sohu2022-nlp-rank1
View on GitHub
2022搜狐校园算法大赛NLP赛道第一名开源方案（实验代码）
☆89Jul 31, 2022Updated 3 years ago
OpenCoder-llm / opc_data_filtering
View on GitHub
Heuristic filtering framework for RefineCode
☆87Mar 13, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
fla-org / fla-zoo
View on GitHub
Flash-Linear-Attention models beyond language
☆21Aug 28, 2025Updated 11 months ago
ictnlp / NMLA-NAT
View on GitHub
Code for NeurIPS 2022 Spotlight paper " Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation"
☆20Nov 16, 2022Updated 3 years ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,279Oct 16, 2024Updated last year
Miraclemarvel55 / LLaMA-MOSS-RLHF-LoRA
View on GitHub
用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]
☆21May 16, 2023Updated 3 years ago
CarnoZhao / GAIIC-Track1
View on GitHub
codes for GAIIC-Track1
☆15Jun 14, 2022Updated 4 years ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆666Jan 2, 2024Updated 2 years ago
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,257Aug 14, 2025Updated 11 months ago
DAMO-NLP-SG / CLEX
View on GitHub
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Mar 12, 2024Updated 2 years ago
THUDM / LongAlign
View on GitHub
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆262Dec 16, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LouChao98 / nner_as_parsing
View on GitHub
☆16Mar 22, 2023Updated 3 years ago
nursery42 / WenTianSearch
View on GitHub
A baseline for WenTianSearch
☆85Mar 24, 2022Updated 4 years ago
shshlzh / TextCNN-Adversarial-Training-in-NLP
View on GitHub
对抗训练在NLP中的应用
☆14Nov 22, 2021Updated 4 years ago
DachengLi1 / LongChat
View on GitHub
Official repository for LongChat and LongEval
☆536May 24, 2024Updated 2 years ago
OpenLMLab / ParallelTokenizer
View on GitHub
Use the tokenizer in parallel to achieve superior acceleration
☆20Mar 21, 2024Updated 2 years ago
ProjectD-AI / LLaMA-Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆69Jul 20, 2023Updated 3 years ago
bigscience-workshop / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,447Mar 20, 2024Updated 2 years ago