shachardon / naturally_occurring_feedbackLinks
☆13Updated last week
Alternatives and similar repositories for naturally_occurring_feedback
Users that are interested in naturally_occurring_feedback are comparing it to the libraries listed below
Sorting:
- Implementation for MomentumSMoE☆18Updated 6 months ago
- My personal web page☆11Updated 2 weeks ago
- ☆78Updated 8 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆216Updated 2 months ago
- ☆30Updated 11 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆273Updated last year
- Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban☆15Updated 3 months ago
- ☆122Updated 8 months ago
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆211Updated this week
- [COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆24Updated 3 months ago
- ☆56Updated 3 months ago
- ☆35Updated 2 months ago
- ☆80Updated this week
- ☆155Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆241Updated 11 months ago
- The official evaluation suite and dynamic data release for MixEval.☆250Updated 11 months ago
- Let's build better datasets, together!☆262Updated 10 months ago
- ☆128Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆109Updated last year
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆215Updated last month
- code for training & evaluating Contextual Document Embedding models☆198Updated 5 months ago
- Replicating O1 inference-time scaling laws☆90Updated 10 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆55Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆190Updated 8 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆218Updated 4 months ago
- Complex Function Calling Benchmark.☆136Updated 9 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆112Updated 4 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆224Updated last month
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆159Updated 2 months ago