Long CoT Fine-Tuning and Reinforcement Learning for LLMs in the Context of the 24-Point Game: A Toy Project
☆25Feb 22, 2025Updated last year
Alternatives and similar repositories for LLM4Game24
Users that are interested in LLM4Game24 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- Anti exploration in offline reinforcement learning☆11May 17, 2021Updated 4 years ago
- ☆17Jun 11, 2025Updated 9 months ago
- ICML 2024 - Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning☆10Jul 16, 2024Updated last year
- ☆12Jul 30, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- THU Methematics for Engineering Master Candidates.(清华大学工程硕士数学课程)☆11Nov 21, 2021Updated 4 years ago
- Collect information about 2018 CS courses in CSE of SYSU.☆11Jun 29, 2022Updated 3 years ago
- Official PyTorch implementation for the ICML 2023 paper "Out-of-Distribution Generalization of Federated Learning via Implicit Invariant …☆13Oct 31, 2023Updated 2 years ago
- Structure From Motion☆10Nov 22, 2022Updated 3 years ago
- Code Repository for NeurIPS 2021 accepted paper, named "Torwards Gradient-based Bilevel Optimization with non-convex Followers and Beyond…☆11Mar 28, 2022Updated 3 years ago
- ☆19Jul 18, 2021Updated 4 years ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 4 months ago
- Author's implementation of ReBRAC, a minimalist improvement upon TD3+BC☆19Oct 22, 2023Updated 2 years ago
- A lightweight reimplementation of Adversarially Trained Actor Critic☆19Mar 19, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 收集整理SYSU期末考试卷子、资料☆10Jul 9, 2019Updated 6 years ago
- Code release for "Supported Policy Optimization for Offline Reinforcement Learning" (NeurIPS 2022), https://arxiv.org/abs/2202.06239☆22Jun 24, 2023Updated 2 years ago
- Author's PyTorch implementation of ICML'23 paper "Policy Regularization with Dataset Constraint for Offline Reinforcement Learning" for D…☆18Nov 8, 2024Updated last year
- Example Code for paper "Provably Faster Algorithms for Bilevel Optimization"☆15Dec 28, 2021Updated 4 years ago
- Code for NeurIPS 2022 paper "Robust offline Reinforcement Learning via Conservative Smoothing"☆24Feb 15, 2023Updated 3 years ago
- Paper: “MEMRL: SELF-EVOLVING AGENTS VIA RUNTIME REINFORCEMENT LEARNING ON EPISODIC MEMORY” Open-Source Code☆65Feb 27, 2026Updated last month
- Adapted source code of Niklaus Wirth's "Compiler Construction" book☆20May 11, 2023Updated 2 years ago
- 订餐系统☆14Mar 5, 2016Updated 10 years ago
- ☆19Feb 20, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆18Feb 2, 2022Updated 4 years ago
- ☆19Jun 25, 2023Updated 2 years ago
- Code for paper "Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules"☆20Apr 19, 2024Updated last year
- Code for reproducing the experiments of the ICML 2019 paper "Robust Learning from Untrusted Sources"☆18Jul 5, 2019Updated 6 years ago
- ☆22Dec 2, 2024Updated last year
- Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"☆27Dec 21, 2025Updated 3 months ago
- Federated Bilevel Optimization☆16Jun 23, 2022Updated 3 years ago
- ☆24Dec 8, 2024Updated last year
- 2022中山大学编译原理☆22Jul 27, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning☆29Feb 21, 2022Updated 4 years ago
- 仓库内为2022年春季学期中山大学编译原理课程资料,课件内含有笔记。☆26Sep 24, 2023Updated 2 years ago
- [ICML 2024] The offical implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage…☆34May 31, 2024Updated last year
- This repository contains the code and simulation files for the submitted paper titled "Autonomous Wind Turbine Inspection Framework Enabl…☆40Oct 24, 2024Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆38Feb 27, 2024Updated 2 years ago
- Assignments of Computer Science courses in SYSU☆32May 22, 2023Updated 2 years ago
- 基于 THU-Beamer-Theme (https://github.com/Trinkle23897/THU-Beamer-Theme) 删删改改而成的☆36May 27, 2022Updated 3 years ago