an implementation of Self-Extend, to expand the context window via grouped attention
☆119Jan 7, 2024Updated 2 years ago
Alternatives and similar repositories for selfextend
Users that are interested in selfextend are comparing it to the libraries listed below
Sorting:
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆665Jun 1, 2024Updated last year
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Sep 6, 2023Updated 2 years ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- A bagel, with everything.☆326Apr 11, 2024Updated last year
- ☆50Mar 14, 2024Updated last year
- ☆38Mar 12, 2024Updated last year
- inference code for mixtral-8x7b-32kseqlen☆105Dec 12, 2023Updated 2 years ago
- EvaByte: Efficient Byte-level Language Models at Scale☆115Apr 22, 2025Updated 10 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆390Jul 9, 2024Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated last year
- Demo of ConversationEntityMemory in Streamlit.☆52Jan 23, 2023Updated 3 years ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆355Jul 29, 2024Updated last year
- An experimental and alternative approach to Finetuning and RAG.☆34Dec 9, 2023Updated 2 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Nov 3, 2023Updated 2 years ago
- Cerule - A Tiny Mighty Vision Model☆68Nov 9, 2025Updated 3 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆88Oct 20, 2023Updated 2 years ago
- Long Context Extension and Generalization in LLMs☆63Sep 21, 2024Updated last year
- ☆203Dec 5, 2024Updated last year
- Fast approximate inference on a single GPU with sparsity aware offloading☆39Jan 4, 2024Updated 2 years ago
- Run evaluation on LLMs using human-eval benchmark☆427Sep 12, 2023Updated 2 years ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆446Oct 16, 2024Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆270Jan 10, 2026Updated last month
- ☆23Jun 4, 2024Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,673Apr 17, 2024Updated last year
- ☆415Nov 2, 2023Updated 2 years ago
- Low-Rank adapter extraction for fine-tuned transformers models☆180May 2, 2024Updated last year
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆147Nov 9, 2024Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Apr 29, 2024Updated last year
- ☆10Feb 12, 2024Updated 2 years ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Long Context Research☆26Jan 26, 2026Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago