yydsok / OPARL
OPARL(Optimistic and Pessimistic Actor in RL)
☆19Updated 7 months ago
Related projects: ⓘ
- Synth-Empathy: Towards High-Quality Synthetic Empathy Data☆10Updated 3 weeks ago
- ☆10Updated 3 weeks ago
- This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).☆15Updated 2 months ago
- [ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆44Updated last month
- ☆24Updated 3 months ago
- ICLR 2024 论文和开源项目合集☆78Updated 4 months ago
- Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models☆10Updated 6 months ago
- Using LLM to evaluate MMLU dataset.☆15Updated 6 months ago
- Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models☆11Updated 10 months ago
- ☆67Updated 3 weeks ago
- Generating figures from research papers, using textual captions from the paper.☆14Updated last year
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆16Updated 6 months ago
- [CVPR2024] This is the official implement of MP5☆72Updated 2 months ago
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆12Updated 4 months ago
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enh…☆24Updated last month
- Direct preference optimization with f-divergences.☆11Updated last week
- This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Co…☆66Updated 2 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆21Updated 3 months ago
- PyTorch implementation of StableMask (ICML'24)☆11Updated 2 months ago
- The code for paper Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models.☆13Updated 5 months ago
- HAZARD challenge☆25Updated 4 months ago
- Repository of "Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning" (NeurIPS 2023 Spotlight)☆37Updated 10 months ago
- ☆11Updated 2 months ago
- Interpretable Diffusion Via Information Decomposition☆20Updated 2 months ago
- Awesome-Low-Rank-Adaptation☆13Updated 2 months ago
- Code of ACM MM 2023 Paper: A Symbolic Characters Aware Model for Solving Geometry Problems☆13Updated 8 months ago
- ☆22Updated 3 months ago
- Survey on Data-centric Large Language Models☆58Updated 2 months ago
- ☆23Updated 2 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆26Updated 4 months ago