Danau5tin / calculator_agent_rlView external linksLinks
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆65May 5, 2025Updated 9 months ago
Alternatives and similar repositories for calculator_agent_rl
Users that are interested in calculator_agent_rl are comparing it to the libraries listed below
Sorting:
- A Multi-Agentic AI Assistant/Builder☆25Jan 23, 2026Updated 3 weeks ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 2 months ago
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- REBUS: A Robust Evaluation Benchmark of Understanding Symbols☆13Aug 13, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- A toy text-to-image model trained from scratch.☆19Jun 9, 2025Updated 8 months ago
- ☆39Jan 27, 2026Updated 2 weeks ago
- ☆118Feb 4, 2026Updated last week
- Waffer-thin FlaskGPT on Vercel.☆12Jun 1, 2023Updated 2 years ago
- Quick Notebook Tutorials☆36Jul 17, 2025Updated 6 months ago
- My learnings (publicly) on RAG systems☆14Jan 2, 2024Updated 2 years ago
- A semantic caching layer for LLM apps. It’s meant to cut down on repeated API calls even when the user phrases things differently☆14Jul 3, 2025Updated 7 months ago
- Exploring Applications of GRPO☆251Aug 25, 2025Updated 5 months ago
- Our library for RL environments + evals☆3,809Feb 8, 2026Updated last week
- SwiftLet is a lightweight Python framework for running open-source Large Language Models (LLMs) locally using safetensors☆28Aug 6, 2025Updated 6 months ago
- [SDM24] Official code for "Time-Transformer"☆17Sep 30, 2025Updated 4 months ago
- Efficient computer use agent powered by Meta Llama 4 Maverick☆46Apr 17, 2025Updated 9 months ago
- Mixtral finetuning☆19Feb 2, 2024Updated 2 years ago
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 8 months ago
- Let's create synthetic textbooks together :)☆76Jan 29, 2024Updated 2 years ago
- Local Ollama with Qdrant RAG: Embed, index, and enhance models for retrieval-augmented generation. Get started with easy setup for powerf…☆25Mar 27, 2024Updated last year
- ☆20Mar 25, 2025Updated 10 months ago
- Exploring limitations of LLM-as-a-judge☆20Aug 17, 2024Updated last year
- Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…☆19Sep 5, 2024Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆180May 2, 2024Updated last year
- ☆67May 23, 2025Updated 8 months ago
- qwen3 experiments☆34Jul 1, 2025Updated 7 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆47Sep 26, 2024Updated last year
- Miscellaneous Tutorials☆26Sep 20, 2023Updated 2 years ago
- ☆160Apr 17, 2025Updated 9 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆42Updated this week