xianminx / mooc-cs294-llm-agentsLinks
CS294/194-196 Large Language Model Agents
☆21Updated 6 months ago
Alternatives and similar repositories for mooc-cs294-llm-agents
Users that are interested in mooc-cs294-llm-agents are comparing it to the libraries listed below
Sorting:
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆55Updated 11 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆84Updated 3 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆87Updated 2 months ago
- ☆62Updated last week
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆93Updated 2 weeks ago
- ☆157Updated last week
- A brief and partial summary of RLHF algorithms.☆129Updated 3 months ago
- A collection of papers on discrete diffusion models☆145Updated 2 weeks ago
- An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.☆154Updated this week
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆100Updated 4 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆131Updated 11 months ago
- CS294/194-196 Large Language Model Agents☆12Updated 4 months ago
- AI Alignment: A Comprehensive Survey☆135Updated last year
- Repo for "Z1: Efficient Test-time Scaling with Code"☆61Updated 2 months ago
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆47Updated 5 months ago
- A Comprehensive Survey on Long Context Language Modeling☆155Updated 3 weeks ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆19Updated 3 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆45Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- ☆107Updated 2 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆191Updated this week
- A research repo for experiments about Reinforcement Finetuning☆49Updated 2 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆265Updated 4 months ago
- ☆228Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆188Updated 3 months ago
- [ICML 2025] A platform for developers to simulate collaborative research activities☆161Updated this week
- Reproducing R1 for Code with Reliable Rewards☆222Updated last month
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆148Updated 2 weeks ago
- Efficient Agent Training for Computer Use☆106Updated 3 weeks ago
- The official repository for the Scientific Paper Idea Proposer (SciPIP)☆62Updated 4 months ago