xianminx / mooc-cs294-llm-agents
CS294/194-196 Large Language Model Agents
☆20Updated 4 months ago
Alternatives and similar repositories for mooc-cs294-llm-agents
Users that are interested in mooc-cs294-llm-agents are comparing it to the libraries listed below
Sorting:
- CS294/194-196 Large Language Model Agents☆11Updated 2 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆53Updated 10 months ago
- ☆58Updated 9 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 3 weeks ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆92Updated 2 months ago
- ☆85Updated 7 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆177Updated last month
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆76Updated last month
- ☆131Updated this week
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆105Updated 5 months ago
- ☆153Updated last month
- llm & rl☆115Updated this week
- AI Alignment: A Comprehensive Survey☆133Updated last year
- Notes and commented code for RLHF (PPO)☆90Updated last year
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"☆82Updated last month
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 11 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆125Updated 10 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆254Updated 2 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆139Updated last month
- ☆34Updated this week
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆35Updated 11 months ago
- ☆19Updated 5 months ago
- ☆41Updated 6 months ago
- [ICLR 2025 Oral] PyTorch code for the paper "Open-World Reinforcement Learning over Long Short-Term Imagination"☆111Updated last week
- ☆47Updated last week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month
- Reproducing R1 for Code with Reliable Rewards☆188Updated last week
- A comprehensive collection of process reward models.☆76Updated last week
- ☆63Updated 5 months ago