giangdip2410 / HyperRouter
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆31Updated 9 months ago
Related projects: ⓘ
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆89Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 8 months ago
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆27Updated 9 months ago
- ☆29Updated 7 months ago
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆56Updated 6 months ago
- ☆23Updated 3 weeks ago
- The official implementation of Self-Exploring Language Models (SELM)☆55Updated 3 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 8 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆59Updated 3 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆76Updated 6 months ago
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆56Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆39Updated 2 weeks ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆55Updated last week
- ☆35Updated last week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- ☆60Updated 5 months ago
- ☆36Updated last month
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆49Updated 4 months ago
- Cascade Speculative Drafting☆23Updated 5 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore, paper coming soon☆18Updated this week
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆48Updated last week
- "Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?"☆24Updated 2 weeks ago
- The official implementation for Collaborative Word-based Pre-trained Item Representation for Transferable Recommendation.☆23Updated 7 months ago
- DSBench: How Far are Data Science Agents Becoming Data Science Experts?☆20Updated this week
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆64Updated 9 months ago
- ☆25Updated 9 months ago
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind☆67Updated 8 months ago
- ☆18Updated this week
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆53Updated 3 months ago