rsshyam / GRPO-bandits
☆13Updated 7 months ago
Alternatives and similar repositories for GRPO-bandits:
Users that are interested in GRPO-bandits are comparing it to the libraries listed below
- ☆16Updated 9 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆26Updated last month
- ☆57Updated 9 months ago
- ☆14Updated last month
- RuleRAG: Rule-guided Retrieval-Augmented Generation with Language Models for Question Answering☆22Updated 5 months ago
- ☆16Updated this week
- ☆36Updated 2 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- ☆15Updated 7 months ago
- ☆23Updated 10 months ago
- ☆11Updated 3 months ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated this week
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆13Updated 3 weeks ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆16Updated last month
- ☆18Updated 5 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆17Updated last month
- ☆20Updated 2 months ago
- Synthesizing realistic and diverse text-datasets from augmented LLMs☆12Updated last month
- ☆62Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆96Updated 6 months ago
- ☆25Updated 6 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆42Updated 3 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆13Updated 2 months ago
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆11Updated 3 months ago
- ☆18Updated 7 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆32Updated last month
- The official source code for "Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling" (ACL 2024, Findings)☆11Updated 8 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆16Updated 6 months ago