kyegomez/LongNet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kyegomez/LongNet)

kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

☆724

Alternatives and similar repositories for LongNet

Users that are interested in LongNet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CStanKonrad / long_llama
View on GitHub
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transform…
☆1,465Nov 7, 2023Updated 2 years ago
kyegomez / Andromeda
View on GitHub
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
☆151Sep 3, 2024Updated last year
alexisrozhkov / dilated-self-attention
View on GitHub
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
☆13Jul 23, 2023Updated 3 years ago
microsoft / torchscale
View on GitHub
Foundation Architecture for (M)LLMs
☆3,133Apr 11, 2024Updated 2 years ago
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,171Jan 23, 2026Updated 6 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Victorwz / LongMem
View on GitHub
Official implementation of our NeurIPS 2023 paper "Augmenting Language Models with Long-Term Memory".
☆827Mar 30, 2024Updated 2 years ago
JIA-Lab-research / LongLoRA
View on GitHub
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
☆2,689Aug 14, 2024Updated last year
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,249Jul 11, 2024Updated 2 years ago
epfml / landmark-attention
View on GitHub
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆426Dec 20, 2023Updated 2 years ago
artidoro / qlora
View on GitHub
QLoRA: Efficient Finetuning of Quantized LLMs
☆10,968Jun 10, 2024Updated 2 years ago
turboderp / exllama
View on GitHub
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,934Sep 30, 2023Updated 2 years ago
OpenLMLab / LOMO
View on GitHub
LOMO: LOw-Memory Optimization
☆994Jul 2, 2024Updated 2 years ago
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,480Jun 7, 2025Updated last year
GATECH-EIC / LaCache
View on GitHub
[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
☆17Nov 4, 2025Updated 8 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
invictus717 / MetaTransformer
View on GitHub
Meta-Transformer for Unified Multimodal Learning
☆1,650Dec 5, 2023Updated 2 years ago
kyegomez / forest-of-thoughts
View on GitHub
A forest of autonomous agents.
☆20Jan 27, 2025Updated last year
thooton / muse
View on GitHub
Let's create synthetic textbooks together :)
☆74Jan 29, 2024Updated 2 years ago
tomaarsen / attention_sinks
View on GitHub
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆735Apr 10, 2024Updated 2 years ago
Jamie-Stirling / RetNet
View on GitHub
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
☆1,210Oct 22, 2023Updated 2 years ago
OpenBMB / ToolBench
View on GitHub
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
☆5,709May 21, 2025Updated last year
ZackBradshaw / ikigAI
View on GitHub
☆13Mar 28, 2024Updated 2 years ago
kyegomez / tree-of-thoughts
View on GitHub
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning …
☆4,591Jul 29, 2025Updated 11 months ago
The-Swarm-Corporation / Brainwave
View on GitHub
Brainwave is a state-of-the-art neural decoder that transforms electroencephalogram (EEG) and brain signals into multimodal outputs inclu…
☆14Oct 6, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,962Apr 13, 2026Updated 3 months ago
kyegomez / Pegasus
View on GitHub
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
☆14Oct 16, 2024Updated last year
neulab / prompt2model
View on GitHub
prompt2model - Generate Deployable Models from Natural Language Instructions
☆2,018Dec 29, 2024Updated last year
kyegomez / Sophia
View on GitHub
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆382Jun 4, 2024Updated 2 years ago
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,531Updated this week
kyegomez / Paper-Implementation-Template
View on GitHub
A simple reproducible template to implement AI research papers
☆24Sep 9, 2024Updated last year
kyegomez / Falcon
View on GitHub
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…
☆12Mar 11, 2024Updated 2 years ago
openlm-research / open_llama
View on GitHub
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
☆7,533Jul 16, 2023Updated 3 years ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,758Jun 25, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mosaicml / llm-foundry
View on GitHub
LLM training code for Databricks foundation models
☆4,431Mar 25, 2026Updated 4 months ago
kyegomez / Algorithm-Of-Thoughts
View on GitHub
My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"
☆100Oct 13, 2023Updated 2 years ago
kyegomez / zeta
View on GitHub
Build high-performance AI models with modular building blocks
☆598Updated this week
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,882Mar 21, 2026Updated 4 months ago
Alpha-VLLM / LLaMA2-Accessory
View on GitHub
An Open-source Toolkit for LLM Development
☆2,801Jan 13, 2025Updated last year
Liuhong99 / Sophia
View on GitHub
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆1,003Jan 30, 2024Updated 2 years ago
BlinkDL / RWKV-LM
View on GitHub
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)…
☆14,639Updated this week