KhoomeiK / LlamaGym
Fine-tune LLM agents with online reinforcement learning
☆995Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for LlamaGym
- LLM Analytics☆615Updated last month
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,336Updated 7 months ago
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"☆682Updated 3 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆448Updated 8 months ago
- Visualize the intermediate output of Mistral 7B☆313Updated 9 months ago
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆753Updated last month
- Agents Capable of Self-Editing Their Prompts / Python Code☆745Updated 8 months ago
- The Open Source Memory Layer For Autonomous Agents☆1,483Updated 3 weeks ago
- High-performance retrieval engine for unstructured data☆982Updated last week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆845Updated 3 months ago
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆322Updated 5 months ago
- Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs)…☆992Updated this week
- ☆935Updated 2 weeks ago
- Finetune llama2-70b and codellama on MacBook Air without quantization☆447Updated 7 months ago
- A library for advanced large language model reasoning☆1,442Updated last week
- Agentless🐱: an agentless approach to automatically solve software development problems☆723Updated last week
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆836Updated 10 months ago
- ☆718Updated 2 months ago
- ☆727Updated 7 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,634Updated this week
- Optimizing inference proxy for LLMs☆1,563Updated this week
- Implementing the 4 agentic patterns from scratch☆751Updated 3 weeks ago
- Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…☆271Updated this week
- Code for Quiet-STaR☆651Updated 3 months ago
- From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)☆594Updated 3 weeks ago
- Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind☆345Updated this week
- An LLM-based autonomous agent controlling real-world applications via RESTful APIs☆1,321Updated 5 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆774Updated 3 months ago
- Automated Design of Agentic Systems☆1,038Updated this week
- Stateful load balancer custom-tailored for llama.cpp☆563Updated this week