j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆102Jul 19, 2025Updated 7 months ago
Alternatives and similar repositories for j1-micro
Users that are interested in j1-micro are comparing it to the libraries listed below
Sorting:
- ☆37Aug 4, 2025Updated 7 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- ☆19Mar 3, 2025Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Jun 13, 2024Updated last year
- ☆14Dec 12, 2024Updated last year
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"☆13Jul 5, 2017Updated 8 years ago
- Consistent dialogue generation☆16Oct 26, 2022Updated 3 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- a single interface around speech-to-speech foundation models☆27Jun 27, 2025Updated 8 months ago
- [ICLR 2026] Learning to Reason without External Rewards☆394Jan 26, 2026Updated last month
- Detect and redact PII locally with SOTA performance☆91Mar 25, 2025Updated 11 months ago
- Minimal agent runtime built with DSPy modules and a thin Python loop. Includes CLI, FastAPI server, and eval harness with OpenAI/Ollama s…☆70Dec 22, 2025Updated 2 months ago
- ☆67May 23, 2025Updated 9 months ago
- Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings☆115Jul 27, 2025Updated 7 months ago
- A framework for optimizing DSPy programs with RL☆323Jan 12, 2026Updated last month
- CompChomper is a framework for measuring how LLMs perform at code completion.☆21Apr 29, 2025Updated 10 months ago
- ☆34Jun 10, 2025Updated 8 months ago
- Vibe. Prove. Verify.☆38Feb 27, 2026Updated last week
- Inference-time scaling for LLMs-as-a-judge.☆330Nov 5, 2025Updated 4 months ago
- ☆137Mar 20, 2025Updated 11 months ago
- rl from zero pretrain, can it be done? yes.☆288Sep 28, 2025Updated 5 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆24Oct 8, 2024Updated last year
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20May 31, 2023Updated 2 years ago
- Example code using the DSPy framework.☆20May 30, 2024Updated last year
- Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)☆27Nov 30, 2025Updated 3 months ago
- ☆36Feb 11, 2025Updated last year
- ☆85Sep 5, 2025Updated 6 months ago
- Easiest way to give context to LLMs; Attachments has the ambition to be the general funnel for any files to be transformed into images+te…☆349Sep 12, 2025Updated 5 months ago
- ReconPro is a specialized Google dorking tool designed for cybersecurity professionals and bug bounty hunters.☆44Feb 23, 2026Updated 2 weeks ago
- Exploring Applications of GRPO☆252Aug 25, 2025Updated 6 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆41Apr 4, 2025Updated 11 months ago
- ☆67Mar 30, 2025Updated 11 months ago
- Software Engineering Back End Microservices Project☆15Nov 20, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆186May 25, 2025Updated 9 months ago
- ☆27Oct 22, 2024Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆38Feb 1, 2026Updated last month
- Make DSPy Agentic using protocol-first approach that support the Agent Protocols like MCP, A2A☆70May 27, 2025Updated 9 months ago