Guangxuan-Xiao / GSM8K-eval
☆11Updated 11 months ago
Related projects: ⓘ
- ☆139Updated 2 months ago
- ☆24Updated 6 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆141Updated 9 months ago
- TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models☆56Updated 7 months ago
- A RLHF Infrastructure for Vision-Language Models☆86Updated 3 months ago
- ☆29Updated 2 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆89Updated 2 months ago
- [ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆44Updated last month
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆225Updated 2 months ago
- ☆110Updated last month
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆81Updated 5 months ago
- ☆73Updated 8 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆58Updated 7 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆16Updated 2 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆53Updated 3 weeks ago
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆85Updated this week
- [SIGIR'24] The official implementation code of MOELoRA.☆113Updated last month
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆191Updated 4 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆98Updated 6 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆176Updated last week
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆61Updated 2 months ago
- ☆31Updated 8 months ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆16Updated 6 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆13Updated 2 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆66Updated 5 months ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆28Updated 5 months ago
- ☆76Updated last month
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆62Updated 2 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- ☆54Updated 2 months ago