SALT-NLP / demonstrated-feedbackLinks

☆125

Alternatives and similar repositories for demonstrated-feedback

Users that are interested in demonstrated-feedback are comparing it to the libraries listed below

Sorting:

Anni-Zou / Meta-CoT
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
☆97Updated last year
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆139Updated 8 months ago
da03 / implicit_chain_of_thought
☆135Updated 8 months ago
ytyz1307zzh / RefAug
Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆55Updated 10 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 11 months ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
google / sycophancy-intervention
Scripts for generating synthetic finetuning data for reducing sycophancy.
☆113Updated last year
jakespringer / echo-embeddings
☆152Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆122Updated 8 months ago
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated 11 months ago
casmlab / NPHardEval
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆57Updated last year
princeton-nlp / LitSearch
[EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search
☆92Updated 8 months ago
architsharma97 / dpo-rlaif
☆99Updated last year
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Updated last year
sail-sg / sailcraft
🚢 Data Toolkit for Sailor Language Models
☆94Updated 5 months ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆146Updated 9 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆207Updated 2 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 8 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆112Updated 5 months ago
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Updated 7 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆112Updated 2 weeks ago
ScalerLab / JudgeBench
☆91Updated 8 months ago
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 6 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
thomasgauthier / LLM-self-play
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆28Updated last year
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆49Updated 8 months ago