RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Feb 23, 2025Updated last year
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- Composition of Multimodal Language Models From Scratch☆15Aug 16, 2024Updated last year
- built a 124M param GPT☆23Jan 28, 2025Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- A Python reimplementation/extension of "Planning with Large Language Models for Code Generation" (https://arxiv.org/abs/2303.05510)☆18Dec 1, 2023Updated 2 years ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Feb 13, 2026Updated 3 weeks ago
- Official repo for our AAAI'21 paper, https://arxiv.org/abs/2007.12354☆27Jul 14, 2021Updated 4 years ago
- Few-Shot Prompting - Chain-of-Thought (CoT) Prompting - Hallucinations - Self-Consistency - Generated Knowledge Prompting - Tree of …☆29Nov 15, 2023Updated 2 years ago
- Financial Analysis and Algorithmic Trading Strategies in Python☆11Feb 16, 2023Updated 3 years ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆12Jan 12, 2021Updated 5 years ago
- Teknofest 2023 Türkçe Doğal Dil İşleme yarışması için gerçekleştirilen bu çalışma, Shap Analizi yöntemi kullanılarak modelin tahminlerini…☆28Mar 31, 2023Updated 2 years ago
- RL algorithm for stock trading with multiple reward functions☆11Apr 21, 2024Updated last year
- ☆15Mar 18, 2025Updated 11 months ago
- Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…☆11Aug 29, 2023Updated 2 years ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated 11 months ago
- ☆12May 26, 2022Updated 3 years ago
- ☆10Jul 21, 2019Updated 6 years ago
- Open Source Tsetlin Machine framework☆17Oct 15, 2018Updated 7 years ago
- a recommendation list of math courses for people with no math background.☆11Mar 2, 2021Updated 5 years ago
- ☆10May 19, 2022Updated 3 years ago
- Sample repository for my awesome Youtube viewers.☆10Jun 3, 2020Updated 5 years ago
- This project is focus on stock prediction,our goal is implementing one trading framework using DRL with LSTM.☆11Jun 1, 2018Updated 7 years ago
- FinanceGPT-B☆10Mar 26, 2024Updated last year
- Mintlemon, Türkçe Doğal Dil İşleme Kütüphanesi, Teknofest Türkçe Doğal Dil İşleme Yarışması kapsamında geliştirildi. Nane&Limon Takımı ad…☆44Jun 1, 2024Updated last year
- ☆15Apr 11, 2023Updated 2 years ago
- Monlan is a collection of Data Science experiments (DRL and other approaches) into FOREX algotrading field. Warning! It's my research pro…☆12Aug 1, 2022Updated 3 years ago
- ROS 2 New Features [Video], published by Packt☆10Oct 28, 2022Updated 3 years ago
- Sample demonstrating deployment of Pytorch models through ONNX within Azure Functions☆12Apr 11, 2024Updated last year
- Rucio K8s tutorial☆11Sep 26, 2025Updated 5 months ago
- PyTorch implementation of GRPO.☆14Apr 21, 2025Updated 10 months ago
- Lossless normalization of uppercase characters☆11Jul 3, 2023Updated 2 years ago
- Source code for MA4270: Data Modelling and Computation on Transformers and Nadaraya-Watson Kernel Regression☆19May 29, 2024Updated last year
- An AI tool designed to generate explanations for every file in a project☆14Mar 7, 2025Updated 11 months ago
- ☆10Mar 31, 2022Updated 3 years ago
- Reimplementation of simple policy gradient algorithms such as REINFORCE and Actor-Critic methods.☆16Aug 26, 2023Updated 2 years ago
- Các thí nghiệm liên quan tới LLMs cho tiếng Việt (insprised by Physics of LLMs Series)☆11Oct 21, 2024Updated last year
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆14Apr 30, 2025Updated 10 months ago
- ☆13May 25, 2023Updated 2 years ago
- In the high-frequency era of trading, orders of stocks can be executed under a millsecond. The information about the thousands of orders …☆10Mar 30, 2016Updated 9 years ago
- Q&A dataset for many-shot jailbreaking☆14Jul 19, 2024Updated last year