Intelligent-Internet / ii-thoughtLinks

II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset

☆28

Alternatives and similar repositories for ii-thought

Users that are interested in ii-thought are comparing it to the libraries listed below

Sorting:

yueqis / API-Based-Agent
☆56Updated 3 months ago
StigLidu / DualDistill
[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆100Updated last month
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 9 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆117Updated 5 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆77Updated 6 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆106Updated 4 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 8 months ago
google-deepmind / llms_can_learn_rules
☆60Updated 10 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated 10 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆74Updated 6 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆108Updated 7 months ago
InternLM / SWE-Fixer
☆120Updated 5 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 8 months ago
GAIR-NLP / LIMI
LIMI: Less is More for Agency
☆134Updated last week
axolotl-ai-cloud / grpo_code
A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.
☆39Updated 6 months ago
PrimeIntellect-ai / genesys
☆135Updated 6 months ago
du-nlp-lab / MLR-Copilot
☆67Updated 6 months ago
THUDM / DeepDive
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
☆173Updated 2 weeks ago
colonylabs / ScribeAgent
Code for ScribeAgent paper
☆62Updated 7 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆54Updated 5 months ago
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
DeepSoftwareAnalytics / Awesome-Agent4SE
☆101Updated last year
YerbaPage / MGDebugger
Multi-Granularity LLM Debugger
☆91Updated 3 months ago
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆93Updated 5 months ago
miralab-ai / autoreason
☆40Updated 10 months ago
VectorSpaceLab / Infomatica
Data Synthesis for Deep Research Based on Semi-Structured Data
☆169Updated this week
dinobby / MAgICoRE
☆23Updated last year
brendanhogan / picoDeepResearch
☆68Updated 4 months ago
gkamradt / SnakeBench
☆93Updated 4 months ago
aymeric-roucher / GAIA
Beating the GAIA benchmark with Transformers Agents. 🚀
☆138Updated 7 months ago