an implementation of Self-Extend, to expand the context window via grouped attention
☆119Jan 7, 2024Updated 2 years ago
Alternatives and similar repositories for selfextend
Users that are interested in selfextend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆664Jun 1, 2024Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Sep 6, 2023Updated 2 years ago
- A bagel, with everything.☆326Apr 11, 2024Updated 2 years ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- inference code for mixtral-8x7b-32kseqlen☆104Dec 12, 2023Updated 2 years ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆389Jul 9, 2024Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆117Apr 22, 2025Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- ☆204Dec 5, 2024Updated last year
- ☆50Mar 14, 2024Updated 2 years ago
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆65Mar 9, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Mar 13, 2025Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆240May 26, 2024Updated last year
- ☆38Mar 12, 2024Updated 2 years ago
- ☆23Apr 15, 2026Updated 2 weeks ago
- Just a bunch of benchmark logs for different LLMs☆124Jul 28, 2024Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆103Jun 14, 2024Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆356Jul 29, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆13Mar 13, 2024Updated 2 years ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆114Feb 20, 2025Updated last year
- ☆13Apr 1, 2026Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Mar 12, 2024Updated 2 years ago
- ☆606Aug 23, 2024Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,708Apr 17, 2024Updated 2 years ago
- ☆56Nov 6, 2024Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆276Jan 10, 2026Updated 3 months ago
- ☆54May 20, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- Run evaluation on LLMs using human-eval benchmark☆430Sep 12, 2023Updated 2 years ago
- ☆10Feb 12, 2024Updated 2 years ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆149Nov 9, 2024Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago