mingyin0312/RL4LLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mingyin0312/RL4LLM)

mingyin0312 / RL4LLM

RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct

☆31

Alternatives and similar repositories for RL4LLM

Users that are interested in RL4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alexander-moore / vlm
View on GitHub
Composition of Multimodal Language Models From Scratch
☆15Aug 16, 2024Updated last year
cavaunpeu / mcts-llm-codegen
View on GitHub
A Python reimplementation + extension of "Planning with Large Language Models for Code Generation" (https://arxiv.org/abs/2303.05510)
☆17Dec 1, 2023Updated 2 years ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
swiss-ai / posttraining
View on GitHub
☆18Jul 17, 2026Updated last week
kyegomez / GPT4o
View on GitHub
Community Open Source Implementation of GPT4o in PyTorch
☆32Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CERT-Lab / abba
View on GitHub
(ICLR '26) ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
☆22Sep 25, 2025Updated 10 months ago
swairshah / Intensify
View on GitHub
coloring terminal text with intensities (used for plotting probability, entropy with tokens)
☆12Oct 11, 2024Updated last year
ekinakyurek / gpt3-arithmetic
View on GitHub
Scratchpad/Chain-of-Thought Prompts
☆12Jun 6, 2022Updated 4 years ago
TarikKaanKoc / prompt-engineering
View on GitHub
Few-Shot Prompting - Chain-of-Thought (CoT) Prompting - Hallucinations - Self-Consistency - Generated Knowledge Prompting - Tree of …
☆30Nov 15, 2023Updated 2 years ago
coderaashir / Crypto-Pairs-Trading
View on GitHub
A Statistical Arbitrage Strategy to trade Cryptocurrency Pairs
☆14Nov 6, 2020Updated 5 years ago
GoogleEngineerExplains / LeetCode-Notes
View on GitHub
☆10May 19, 2022Updated 4 years ago
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆29Sep 10, 2024Updated last year
HarleyCoops / smolThinker-.5B
View on GitHub
A Qwen .5B reasoning model trained on OpenR1-Math-220k
☆14Jul 22, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
seanmacavaney / plaidrepro
View on GitHub
☆11Feb 9, 2024Updated 2 years ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
nghiahhnguyen / CS224W-Stanford
View on GitHub
This is the repository containing the solution of the homework for the CS224W course at Stanford: Machine Learning with Graphs
☆11Jul 19, 2020Updated 6 years ago
LukeWood / ez-timer
View on GitHub
☆10Mar 28, 2022Updated 4 years ago
oseledets / nla2025
View on GitHub
Skoltech NLA 2025 course.
☆17Nov 30, 2025Updated 7 months ago
m-ali-awan / yolov5-seg-labels-conversion
View on GitHub
Converting Instance segmentation labels in COCO format to YOLOv5-seg
☆13Feb 10, 2023Updated 3 years ago
watcl-lab / positional_attention
View on GitHub
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14May 26, 2025Updated last year
triple-mu / Stable-Diffusion-TensorRT
View on GitHub
Stable Diffusion in TensorRT 8.5+
☆15Mar 19, 2023Updated 3 years ago
antimatter15 / alpaca-lora
View on GitHub
Code for reproducing the Stanford Alpaca InstructLLaMA result on consumer hardware
☆19Mar 16, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
GrowlyX / instantgrep
View on GitHub
CLI implementation of Cursor's Instant Grep in Elixir.
☆21Mar 26, 2026Updated 4 months ago
CyberAgentAILab / regularized-bon
View on GitHub
Code of "Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment" (2025).
☆14Apr 4, 2025Updated last year
edent / RGB_Colours
View on GitHub
This is the code which powers the Twitter Bot https://twitter.com/RGB_Colours
☆15Apr 14, 2017Updated 9 years ago
onuralpszr / kopikatAPI
View on GitHub
KopikatAPI is Python library for interacting with the Kopikat API.
☆17Updated this week
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
CERT-Lab / fed-sb
View on GitHub
(TMLR J2C Certification) Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tu…
☆27Oct 4, 2025Updated 9 months ago
alasdairforsythe / capcode
View on GitHub
Lossless normalization of uppercase characters: Go, C++ & JavaScript
☆11Jul 7, 2026Updated 3 weeks ago
tatHi / optok
View on GitHub
☆10Aug 26, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
PacktPublishing / Implementing-Deep-Learning-Algorithms-with-TensorFlow-2.0
View on GitHub
☆11Jan 30, 2023Updated 3 years ago
IBM / SafeLoRA
View on GitHub
Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"
☆29Dec 21, 2025Updated 7 months ago
kroggen / tokenformer-minimal
View on GitHub
Minimal implementation of TokenFormer for inference and learning
☆13Nov 6, 2024Updated last year
dfm / ketu
View on GitHub
I can haz planetz?
☆12Jun 12, 2020Updated 6 years ago
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
packquickly / schedule_free_optx
View on GitHub
Schedule free optimiser implemented in JAX using Optimistix
☆15May 29, 2024Updated 2 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago