duykhuongnguyen / LASeR-MABLinks

Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"

☆13

Alternatives and similar repositories for LASeR-MAB

Users that are interested in LASeR-MAB are comparing it to the libraries listed below

Sorting:

yale-nlp / refdpo
☆16Updated 11 months ago
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
formll / resolving-scaling-law-discrepancies
☆20Updated last year
alon-albalak / FLAD
Few-shot Learning with Auxiliary Data
☆28Updated last year
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆16Updated 5 months ago
general-preference / general-preference-model
Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…
☆25Updated 2 months ago
allenai / sso
Repository for Skill Set Optimization
☆14Updated 11 months ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 5 months ago
maszhongming / ParaKnowTransfer
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆32Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 9 months ago
UCSB-NLP-Chang / Prereq_tune
Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"
☆10Updated 6 months ago
tml-epfl / icl-alignment
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆30Updated 5 months ago
xhan77 / in-context-alignment
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
☆35Updated last year
arnab-api / romba
Applies ROME and MEMIT on Mamba-S4 models
☆14Updated last year
nverma1 / merging-text-transformers
Code for "Merging Text Transformers from Different Initializations"
☆20Updated 5 months ago
Leezekun / MacRAG
☆16Updated last week
wangskyGit / passage-sieve
official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization
☆13Updated last year
googleinterns / localizing-paragraph-memorization
☆14Updated last year
linkedin / ControlLLM
Control LLM
☆17Updated 3 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆33Updated 3 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆33Updated 2 years ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆25Updated 7 months ago
yikangshen / megablocks
☆20Updated last year
ctlllll / reward_collapse
☆27Updated 2 years ago
yumeng5 / FewGen
[ICML 2023] Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
☆42Updated 2 years ago
HazyResearch / embroid
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Updated last year
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last month