☆93May 16, 2025Updated 9 months ago
Alternatives and similar repositories for WorldPM
Users that are interested in WorldPM are comparing it to the libraries listed below
Sorting:
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …☆24May 29, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- KuaiSearch PERKS☆12Nov 16, 2021Updated 4 years ago
- AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)☆15Dec 30, 2024Updated last year
- ☆18Apr 5, 2025Updated 10 months ago
- ☆13Nov 26, 2021Updated 4 years ago
- AI for Mathematics Paper List☆17Jan 14, 2025Updated last year
- 🤡 An up-to-date & curated list of awesome KBQA papers, methods & resources.☆10Jul 14, 2022Updated 3 years ago
- ☆29May 8, 2024Updated last year
- ☆47Aug 5, 2025Updated 6 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆270Sep 12, 2024Updated last year
- 《多模态大模型部署微调指南》快速部署/微调多模态大模型☆12Dec 4, 2024Updated last year
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 10 months ago
- ☆12Mar 27, 2024Updated last year
- ImProver: Agent-Based Automated Proof Optimization☆40Jan 18, 2026Updated last month
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- 💻 Terminal-Agent with Human-in-the-Loop Learning☆34Jan 16, 2026Updated last month
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- ☆17May 31, 2023Updated 2 years ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- The code of CIKM 2023 (Oral Presentation) : A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NE…☆14Jul 19, 2024Updated last year
- ☆17Jul 12, 2025Updated 7 months ago
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated 10 months ago
- ☆71Oct 23, 2025Updated 4 months ago
- ☆813Jun 9, 2025Updated 8 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆102Feb 20, 2025Updated last year
- All-in-one benchmarking platform for evaluating LLM.☆15Nov 12, 2025Updated 3 months ago
- [ICLR'25 Spotlight] Rethinking and improving autoformalization: towards a faithful metric and a Dependency Retrieval-based approach☆27May 20, 2025Updated 9 months ago
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆46Dec 17, 2025Updated 2 months ago
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 6 months ago
- OmniGAIA: Towards Native Omni-Modal AI Agents☆46Updated this week
- Llemma formal2formal (tactic prediction) theorem proving experiments☆20Oct 17, 2023Updated 2 years ago
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.☆21Apr 3, 2025Updated 11 months ago
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆35Oct 23, 2024Updated last year
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆689Aug 5, 2025Updated 6 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Dec 10, 2024Updated last year
- 大模型多维度中文对齐评测基准 (ACL 2024)☆421Oct 25, 2025Updated 4 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated 2 months ago
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 3 years ago