SalesforceAIResearch / GemFilter
☆75Updated last month
Alternatives and similar repositories for GemFilter:
Users that are interested in GemFilter are comparing it to the libraries listed below
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆140Updated 5 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆72Updated 8 months ago
- ☆82Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆153Updated 2 months ago
- This is the official repository for Inheritune.☆109Updated last week
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆100Updated 7 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆77Updated 4 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 8 months ago
- A repository for research on medium sized language models.☆76Updated 8 months ago
- ☆125Updated last year
- Long Context Extension and Generalization in LLMs☆48Updated 4 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆69Updated 2 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 8 months ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆53Updated 10 months ago
- [ICLR 2025] SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆53Updated last week
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆72Updated 8 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 9 months ago
- ☆71Updated 6 months ago
- ☆64Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 4 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆56Updated 4 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆24Updated 5 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆35Updated 8 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- ☆53Updated 4 months ago