VectorInstitute / flex_model
☆12Updated 10 months ago
Alternatives and similar repositories for flex_model:
Users that are interested in flex_model are comparing it to the libraries listed below
- LLM finetuning in resource-constrained environments.☆42Updated 6 months ago
- ☆63Updated 2 years ago
- ☆72Updated 8 months ago
- ☆38Updated 9 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆21Updated 9 months ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆36Updated 2 years ago
- ☆50Updated 2 months ago
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 4 months ago
- Efficient LLM inference on Slurm clusters using vLLM.☆43Updated this week
- PyTorch building blocks for OLMo☆47Updated this week
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆56Updated 3 months ago
- Monet: Mixture of Monosemantic Experts for Transformers☆43Updated this week
- ☆51Updated 7 months ago
- ☆34Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆21Updated 4 months ago
- ☆44Updated last year
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆97Updated last year
- ☆20Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- ☆23Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.☆24Updated 8 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆58Updated last year
- ☆52Updated last year
- Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers☆37Updated 7 months ago
- Code for the ACL 2023 paper: "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Sc…☆29Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- ☆45Updated last year