an implementation of Self-Extend, to expand the context window via grouped attention
☆119Jan 7, 2024Updated 2 years ago
Alternatives and similar repositories for selfextend
Users that are interested in selfextend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆664Jun 1, 2024Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆81Jan 18, 2024Updated 2 years ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Sep 6, 2023Updated 2 years ago
- Demo of ConversationEntityMemory in Streamlit.☆51Jan 23, 2023Updated 3 years ago
- A bagel, with everything.☆326Apr 11, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- inference code for mixtral-8x7b-32kseqlen☆104Dec 12, 2023Updated 2 years ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆390Jul 9, 2024Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆117Apr 22, 2025Updated 11 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- ☆50Mar 14, 2024Updated 2 years ago
- ☆202Dec 5, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs☆64Mar 9, 2026Updated last month
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Mar 13, 2025Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆241May 26, 2024Updated last year
- ☆38Mar 12, 2024Updated 2 years ago
- ☆23Jun 4, 2024Updated last year
- Just a bunch of benchmark logs for different LLMs☆121Jul 28, 2024Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆103Jun 14, 2024Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆355Jul 29, 2024Updated last year
- Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.☆13Mar 13, 2024Updated 2 years ago
- Train 774M, 1.5B models with the Google's S3 optimizer☆13Sep 7, 2019Updated 6 years ago
- ☆13Apr 1, 2026Updated last week
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Mar 12, 2024Updated 2 years ago
- ☆602Aug 23, 2024Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,690Apr 17, 2024Updated last year
- ☆56Nov 6, 2024Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆275Jan 10, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆54May 20, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- Run evaluation on LLMs using human-eval benchmark☆430Sep 12, 2023Updated 2 years ago
- ☆10Feb 12, 2024Updated 2 years ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆149Nov 9, 2024Updated last year