wantbook-book/SeRL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wantbook-book/SeRL)

wantbook-book / SeRL

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

☆24

Alternatives and similar repositories for SeRL

Users that are interested in SeRL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MaybeLizzy / PERMU
View on GitHub
☆34Oct 4, 2025Updated 9 months ago
Cra2yDavid / MAM
View on GitHub
[IEEE Transactions on Power Systems] Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-…
☆26Jun 2, 2024Updated 2 years ago
liushunyu / awesome-direct-preference-optimization
View on GitHub
A Survey of Direct Preference Optimization (DPO)
☆95Jul 4, 2025Updated last year
jiaconghu / Model-LEGO
View on GitHub
Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks
☆17Jan 15, 2025Updated last year
Raiden-Zhu / ICML-2023-DSGD-and-SAM
View on GitHub
[ICML 2023] Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
☆20Dec 4, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Goekdeniz-Guelmez / mlx-embeddings-lora
View on GitHub
Train Embedding Models on MLX.
☆17Jun 2, 2026Updated last month
sastpg / RFTT
View on GitHub
RFTT: Reasoning with Reinforced Functional Token Tuning
☆29Feb 12, 2026Updated 5 months ago
GraphPKU / LIFT
View on GitHub
The official implementation of LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning
☆15Mar 14, 2025Updated last year
Raiden-Zhu / Generalization-of-DSGD
View on GitHub
The official implementation of the paper "Topology-aware Generalization of Decentralized SGD"
☆37Mar 29, 2023Updated 3 years ago
alex-damian / EOS
View on GitHub
☆15Sep 29, 2022Updated 3 years ago
liushunyu / CIA
View on GitHub
[AAAI 2023 Oral] Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition
☆39Jun 3, 2024Updated 2 years ago
liushunyu / OPT
View on GitHub
[TPAMI] Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning
☆34May 17, 2024Updated 2 years ago
bicici / FDA
View on GitHub
Feature Decay Algorithms
☆11Mar 5, 2014Updated 12 years ago
Raphaaal / fieldy
View on GitHub
Fine-grained attention in hierarchical transformers for tabular time-series.
☆12Dec 24, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
shuizhonghaitong / classification_GAT
View on GitHub
用唐诗知识图谱、带标签的诗词作输入 2层GAT+attention 唐诗题材分类 Tensorflow框架
☆14Oct 29, 2021Updated 4 years ago
HHW-zhou / TSMMG
View on GitHub
Code of "Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model"
☆13Jul 8, 2025Updated last year
cvgmi / manifold-net-dmri
View on GitHub
ManifoldNet Paper Implementation for SPD(n)
☆11Nov 10, 2021Updated 4 years ago
Goekdeniz-Guelmez / mlx-kan
View on GitHub
KAN (Kolmogorov–Arnold Networks) in the MLX framework for Apple Silicon
☆32Jun 18, 2025Updated last year
pan2013e / ZJU-beamer-template
View on GitHub
浙江大学Beamer模板
☆16May 19, 2022Updated 4 years ago
zyh1999 / CADP
View on GitHub
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?
☆36May 22, 2024Updated 2 years ago
LukeLIN-web / vote
View on GitHub
Vision-Language-Action Optimization with Trajectory Ensemble Voting (ICANN2026)
☆26Feb 18, 2026Updated 5 months ago
bo-yang / stip_fisher
View on GitHub
Action recognition with STIP features and my own Fisher vector implementation
☆14Mar 29, 2017Updated 9 years ago
longyuewangdcu / Cross-Sentence-NMT
View on GitHub
Cross Sentence Neural Machine Translation
☆10Mar 26, 2018Updated 8 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
uuujf / SGDNoise
View on GitHub
[ICML 2019] The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
☆15Apr 12, 2020Updated 6 years ago
MingLiiii / ThinkARM
View on GitHub
Schoenfeld’s Anatomy of Mathematical Reasoning by Language Models
☆27Dec 21, 2025Updated 7 months ago
aaronserianni / attention-iou
View on GitHub
[CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps
☆13Mar 26, 2025Updated last year
huangwb / LDS-toolbox
View on GitHub
LDS-toolbox: a matlab toolbox for linear dynamical systems (LDSs) modeling
☆13Mar 23, 2018Updated 8 years ago
rgrishman / ice
View on GitHub
Ice is a rapid information extraction customizer
☆15Apr 26, 2021Updated 5 years ago
mespadoto / proj-quant-eval
View on GitHub
☆16Sep 12, 2019Updated 6 years ago
violet-zct / fairseq-dro-mnmt
View on GitHub
☆14Sep 10, 2021Updated 4 years ago
01yzzyu / wikiautogen
View on GitHub
[ICCV2025] WikiAutoGen offical page
☆25Feb 6, 2026Updated 5 months ago
cbruyndoncx / crewAI-xls
View on GitHub
Gradio UI to load crewAI configuration from excel xls and generate the python code. The source of the crews is in the xls. It allows for …
☆10Oct 17, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
luli-git / MAP
View on GitHub
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
☆18Sep 2, 2024Updated last year
mohitkumarahuja / Visual-Tracking-Using-MeanShift
View on GitHub
Mean-Shift (MS) Mean-Shift (MS) is widely known as one of the most basic yet powerful tracking algorithms. Mean- Shift considers feature …
☆11Dec 21, 2017Updated 8 years ago
HITsz-TMG / VisionGraph
View on GitHub
The benchmark and datasets of the ICML 2024 paper "VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual C…
☆17May 27, 2024Updated 2 years ago
YujunZhou / EVOL-RL
View on GitHub
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
☆51Mar 31, 2026Updated 3 months ago
GitWR / SymNet
View on GitHub
This is a matlab implementation of our article, named "SymNet: A Simple Symmetric Positive Definite Manifold Deep Learning Method for Ima…
☆16Dec 10, 2020Updated 5 years ago
ying-hui-he / Hi-ToM_dataset
View on GitHub
☆21Oct 11, 2025Updated 9 months ago
toltoxgh / CoreNLP-jMWE
View on GitHub
Stanford CoreNLP annotator implementing jMWE for detecting Multi-Word Expressions / collocations
☆15Jan 6, 2017Updated 9 years ago