Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆24Oct 8, 2024Updated last year
Alternatives and similar repositories for REFUEL
Users that are interested in REFUEL are comparing it to the libraries listed below
Sorting:
- ☆26Jan 4, 2026Updated 2 months ago
- Explore, Establish, Exploit: Red Teaming Language Models from Scratch☆13Jun 21, 2023Updated 2 years ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 10 months ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆20May 24, 2025Updated 9 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 8 months ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆18Mar 28, 2022Updated 3 years ago
- Official code implementation of the paper: QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmente…☆38Jan 10, 2026Updated last month
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation☆27Jun 7, 2024Updated last year
- [ACL'25 Main] Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆39May 26, 2025Updated 9 months ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆30Jun 27, 2024Updated last year
- An environment for benchmarking commonsense agents☆29Aug 19, 2020Updated 5 years ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated last month
- ☆32Jul 11, 2024Updated last year
- ☆33Oct 31, 2024Updated last year
- ☆71Oct 23, 2025Updated 4 months ago
- Reinforcement learning environment for UR5e robot with OPENAI gym like format. Include both simulation and real parts.☆14Nov 2, 2021Updated 4 years ago
- Gazetteer of the Ancient Near East Data☆10Aug 1, 2013Updated 12 years ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Jun 13, 2024Updated last year
- ☆11Jun 16, 2024Updated last year
- Source codes of Learning Causal Representations for Robust Domain Adaptation (IEEE TKDE)☆12Feb 14, 2022Updated 4 years ago
- A LaTeX document class for notes 📝 and textbooks 📚☆13Jul 14, 2021Updated 4 years ago
- A generic tensorflow library for robotics: a bridge between robotics problem and modern machine learning architecture. Provides forward k…☆13Apr 12, 2024Updated last year
- Heatmap-based Out-of-Distribution Detection (WACV 2023)☆13Mar 27, 2024Updated last year
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Aug 13, 2023Updated 2 years ago
- This project involves recognising handwritten digits from MNIST Dataset from UCI ML repository by implementing perceptron learning algori…☆11Mar 12, 2022Updated 3 years ago
- Accelerating RL for LLM Reasoning with Optimal Advantage Regression☆37May 30, 2025Updated 9 months ago
- Official implementation of Panacea: A foundation model for clinical trial design, recruitment, search, and summarization.☆18Dec 24, 2024Updated last year
- ☆14Jan 10, 2025Updated last year
- Code for CoRL 2022 paper: https://arxiv.org/abs/2211.09006 (simulation environments)☆11Feb 9, 2023Updated 3 years ago
- RTSP streaming for ROS image topics☆14Jul 30, 2025Updated 7 months ago
- Get and update GitHub repository topics.☆12Aug 21, 2017Updated 8 years ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…☆11Nov 28, 2023Updated 2 years ago
- ☆11Feb 9, 2024Updated 2 years ago
- General purpose application server for the radar platform currently with capability to schedule push notifications☆11Jan 21, 2026Updated last month
- Gym wrapper for pysc2☆10Sep 16, 2022Updated 3 years ago
- A TF2.0 implementation of RL baselines.☆10Sep 24, 2021Updated 4 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Jun 28, 2025Updated 8 months ago
- ☆11Jun 19, 2024Updated last year