Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.
☆11Apr 5, 2023Updated 2 years ago
Alternatives and similar repositories for Chain-of-Hindsight-PyTorch
Users that are interested in Chain-of-Hindsight-PyTorch are comparing it to the libraries listed below
Sorting:
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- ☆61Aug 2, 2023Updated 2 years ago
- ☆16Jun 25, 2025Updated 8 months ago
- ☆10Oct 11, 2022Updated 3 years ago
- Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning☆25Jan 5, 2026Updated 2 months ago
- PyTorch Implementation for the paper "Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation" accepted to RA-L'24.☆12Nov 27, 2024Updated last year
- grpo to train long form QA and instructions with long-form reward model☆17Jul 17, 2025Updated 7 months ago
- 기획자와 마케터를 위한 이벤트 댓글 분석 - feat. 인프런 새해 다짐 이벤트☆11Apr 22, 2020Updated 5 years ago
- Visualizing 230 years of US Census data☆12Feb 23, 2020Updated 6 years ago
- ☆14Feb 2, 2025Updated last year
- KuaiSearch PERKS☆12Nov 16, 2021Updated 4 years ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- ☆12Jun 16, 2023Updated 2 years ago
- ☆13Mar 3, 2024Updated 2 years ago
- Korean Abstract Meaning Representation (AMR) Corpus☆10Feb 27, 2022Updated 4 years ago
- 코로나-19 에 대한 확진/완치/사망 에 대한 국내, 해외 정보를 수집합니다. Data scrapes Covid-19 Confirmed/Cured/Deceases Cases.☆10Jun 6, 2021Updated 4 years ago
- ☆12Feb 9, 2022Updated 4 years ago
- Rendering code for ShapeNet models☆11Apr 20, 2017Updated 8 years ago
- A project designed to build and render a full Minecraft crafting tree.☆10Aug 10, 2021Updated 4 years ago
- Blog of the LibreCV.org☆11May 17, 2021Updated 4 years ago
- Lipschitz Lifelong RL☆11Nov 6, 2020Updated 5 years ago
- ☆15May 11, 2025Updated 9 months ago
- ☆14Jul 18, 2025Updated 7 months ago
- ☆10Apr 20, 2016Updated 9 years ago
- [2022.05.16 ~ 2022.06.10] 🌤️미세먼지 없는 맑은 사진📷 - 부스트캠프 AI Tech 3기 최종 프로젝트☆14Jun 11, 2022Updated 3 years ago
- Flutter Application starter using get ecosystem☆10Jan 19, 2021Updated 5 years ago
- EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets☆10Dec 12, 2023Updated 2 years ago
- The official repository for the CodeGym project: "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym"☆23Oct 14, 2025Updated 4 months ago
- [데이콘] 가스공급량 수요예측 모델 개발 대회 3등☆11Apr 12, 2022Updated 3 years ago
- Templates and examples for ACL and EMNLP conference posters.☆14Oct 5, 2024Updated last year
- Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022☆13Apr 13, 2022Updated 3 years ago
- Code for COLING 2022 paper "FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition"☆15Jan 15, 2023Updated 3 years ago
- [제 11회 투빅스 컨퍼런스] AM I OK ? - 전문의 답변 기반 심리진단 AI☆12Jan 19, 2021Updated 5 years ago
- Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"☆11Jan 15, 2020Updated 6 years ago
- Reference list of email processing resources; focus on preservation and PII handling☆14Apr 20, 2022Updated 3 years ago
- Nested Named Entity Recognition for Chinese Biomedical Text☆11Jan 25, 2024Updated 2 years ago
- Flash cards chrome extensions☆13Jul 12, 2022Updated 3 years ago
- ☆10Mar 10, 2023Updated 2 years ago