MrYxJ / InfiniRetri
☆44Updated 2 months ago
Alternatives and similar repositories for InfiniRetri:
Users that are interested in InfiniRetri are comparing it to the libraries listed below
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- Lego for GRPO☆27Updated 3 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- Train your own SOTA deductive reasoning model☆88Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated 2 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆41Updated last month
- look how they massacred my boy☆63Updated 6 months ago
- Official repo of paper LM2☆37Updated 2 months ago
- ☆38Updated 9 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- accompanying material for sleep-time compute paper☆56Updated this week
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 6 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated 3 weeks ago
- Tina: Tiny Reasoning Models via LoRA☆55Updated this week
- ☆48Updated 5 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated last month
- ☆84Updated last week
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆64Updated last month
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- ☆53Updated last month
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆52Updated 3 weeks ago
- ☆37Updated 2 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆88Updated this week
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated 2 months ago
- ☆121Updated last week
- One Line To Build Zero-Data Classifiers in Minutes☆53Updated 7 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆78Updated 2 months ago
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆30Updated last week
- A repository for research on medium sized language models.☆76Updated 11 months ago