chijames / KERPLE
☆16Updated last year
Related projects: ⓘ
- ☆28Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆38Updated 2 months ago
- ☆32Updated 5 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆29Updated 11 months ago
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…☆25Updated last year
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆31Updated 9 months ago
- Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆22Updated 5 months ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆16Updated last month
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆41Updated 2 years ago
- ☆24Updated 6 months ago
- ☆42Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year
- Retrieval as Attention☆77Updated last year
- DEMix Layers for Modular Language Modeling☆51Updated 3 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆60Updated 4 months ago
- ☆20Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆22Updated last month
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆28Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- Influence Experiments☆36Updated last year
- The original Backpack Language Model implementation, a fork of FlashAttention☆63Updated last year
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆51Updated last year
- PyTorch reimplementation of REALM and ORQA☆22Updated 2 years ago
- Pile Deduplication Code☆15Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆68Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆31Updated 5 months ago
- ☆21Updated last year