☆27Jul 18, 2025Updated 7 months ago
Alternatives and similar repositories for iw_sft
Users that are interested in iw_sft are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, co…☆36Sep 9, 2025Updated 6 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆62Oct 24, 2025Updated 4 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.☆100Oct 15, 2025Updated 4 months ago
- The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆22Nov 9, 2025Updated 4 months ago
- ☆58Dec 11, 2025Updated 2 months ago
- ☆55Jul 7, 2025Updated 8 months ago
- Faikin Remote (code and PCB)☆18Feb 22, 2026Updated 2 weeks ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆70Sep 13, 2025Updated 5 months ago
- asyncio-friendly python API for Sensibo (https://sensibo.com). Requires Python 3.4+☆11Updated this week
- ☆10Dec 17, 2020Updated 5 years ago
- ☆21Aug 8, 2025Updated 7 months ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- A smart web crawler built in Rust that uses Claude AI to select the most relevant URLs from website sitemaps based on crawling objectives…☆19Jul 9, 2025Updated 8 months ago
- A semantic code search tool for intelligent, cross-repo context retrieval.☆27Sep 14, 2025Updated 5 months ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆22Aug 11, 2025Updated 6 months ago
- Live graph data platform, build for hyper-scale & progressive security☆15Mar 2, 2026Updated last week
- [NeurIPS 2024 poster] Cross-model Control: Improving Multiple Large Language Models in One-time Training☆14Oct 25, 2024Updated last year
- A collection of papers and libraries for performing multi-agent optimization☆17Feb 7, 2026Updated last month
- [ACL 2025 Main] (🏆 Outstanding Paper Award) Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Proba…☆16Aug 15, 2025Updated 6 months ago
- Sonos server for my 5 year old to control his speaker using an esp32s3 M5Stack CardPuter☆25Sep 1, 2025Updated 6 months ago
- ☆12Apr 25, 2025Updated 10 months ago
- ☆18May 3, 2025Updated 10 months ago
- ☆10Oct 25, 2024Updated last year
- The official implementation of the paper "Self-Updatable Large Language Models by Integrating Context into Model Parameters"☆15May 18, 2025Updated 9 months ago
- Towards a Unified View of Large Language Model Post-Training☆204Sep 8, 2025Updated 6 months ago
- A fully-complete (semi-stable) Rust SDK for Firecracker microVM-utilizing applications.☆17Dec 28, 2025Updated 2 months ago
- My mind, mapped onto markdown notes☆34Feb 20, 2026Updated 2 weeks ago
- Training Vision Transformers for Semi-Supervised Semantic Segmentation☆14Nov 3, 2025Updated 4 months ago
- SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data☆21Jan 24, 2026Updated last month
- FUNctional magnetic resonance imaging Phase SYnchronization☆12Dec 15, 2015Updated 10 years ago
- Demo of interning in Rust applied to RATP's disruptions API☆14Feb 3, 2026Updated last month
- A PyTorch implementation of "Towards Self-Explainable Graph Neural Network" (CIKM 2021).☆13Nov 4, 2021Updated 4 years ago
- Ordered Turtle Serializer for rdflib☆11Mar 28, 2018Updated 7 years ago
- Install fonts on your system☆32Feb 4, 2026Updated last month
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 7 months ago
- Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629☆21Oct 14, 2025Updated 4 months ago
- ☆11Apr 30, 2025Updated 10 months ago
- ☆10Feb 18, 2020Updated 6 years ago
- A CLI tool to convert OpenBSD Packet Filter configuration files (`pf.conf`) to JSON and vice versa.☆32Jan 8, 2026Updated 2 months ago