OpenMOSS / VehicleWorldLinks
VehicleWorld is the first comprehensive multi-device environment for intelligent vehicle interaction that accurately models the complex, interconnected systems in modern cockpits.
☆21Updated 4 months ago
Alternatives and similar repositories for VehicleWorld
Users that are interested in VehicleWorld are comparing it to the libraries listed below
Sorting:
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆58Updated last year
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆38Updated 6 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆151Updated 11 months ago
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆23Updated last month
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- ☆30Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Updated last year
- Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"☆61Updated last year
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆47Updated 2 years ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆81Updated last year
- Directional Preference Alignment☆58Updated last year
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆199Updated 2 years ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Updated 9 months ago
- ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.☆62Updated last year
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Updated last year
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆46Updated 11 months ago
- Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)☆32Updated last year
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆153Updated 3 months ago
- [ICML 2025] Official Implementation of GLIDER☆72Updated 3 months ago
- ☆22Updated last year
- Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…☆91Updated last week
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆85Updated 2 years ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Updated last year
- instruction-following benchmark for large reasoning models☆44Updated 5 months ago
- ☆72Updated last year