Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆70May 5, 2025Updated 11 months ago
Alternatives and similar repositories for calculator_agent_rl
Users that are interested in calculator_agent_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- Our library for RL environments + evals☆3,986Apr 9, 2026Updated last week
- Exploring Applications of GRPO☆252Aug 25, 2025Updated 7 months ago
- Waffer-thin FlaskGPT on Vercel.☆12Jun 1, 2023Updated 2 years ago
- SwiftLet is a lightweight Python framework for running open-source Large Language Models (LLMs) locally using safetensors☆28Aug 6, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- ☆67May 23, 2025Updated 10 months ago
- A research repo for experiments about Reinforcement Finetuning☆54Apr 7, 2025Updated last year
- Python package for rematerialization-aware gradient checkpointing☆27Oct 31, 2023Updated 2 years ago
- A Multi-Agentic AI Assistant/Builder☆26Jan 23, 2026Updated 2 months ago
- ☆20Mar 25, 2025Updated last year
- REBUS: A Robust Evaluation Benchmark of Understanding Symbols☆13Aug 13, 2024Updated last year
- ☆20Oct 25, 2025Updated 5 months ago
- The State Of The Art, intelligence☆158Aug 12, 2025Updated 8 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 10 months ago
- Quick Notebook Tutorials☆36Jul 17, 2025Updated 9 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆181May 2, 2024Updated last year
- CLIR version of ColBERT☆73Jun 23, 2025Updated 9 months ago
- Let's create synthetic textbooks together :)☆76Jan 29, 2024Updated 2 years ago
- ☆12Jul 10, 2024Updated last year
- An AI character interaction system with emotional modeling and advanced memory management☆17Oct 26, 2024Updated last year
- ☆23Jun 4, 2024Updated last year
- ☆121Jun 11, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- ☆31Dec 20, 2025Updated 3 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆479Sep 27, 2024Updated last year
- Some microbenchmarks and design docs before commencement☆11Feb 1, 2021Updated 5 years ago
- Locally hosted AI Agent Python Tool To Generate Novel Research Hypothesis + Titles + Abstracts☆30Apr 30, 2025Updated 11 months ago
- Simple GRPO scripts and configurations.☆58Feb 6, 2025Updated last year
- Mixtral finetuning☆19Feb 2, 2024Updated 2 years ago
- ☆18Nov 20, 2024Updated last year
- Efficient computer use agent powered by Meta Llama 4 Maverick☆46Apr 17, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Examples for KubeEdge☆13Sep 29, 2020Updated 5 years ago
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated 2 years ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 4 months ago
- [KDD 2025] The source code for UQABench☆13Aug 18, 2025Updated 7 months ago
- ☆20Feb 11, 2024Updated 2 years ago
- JacQues is a Dash-based interactive web application that facilitates real-time chat and document management.☆22Jan 5, 2026Updated 3 months ago
- ☆37Feb 5, 2025Updated last year