tencent-ailab / CogKernel

☆23

Related projects: ⓘ

SALT-NLP / demonstrated-feedback
☆105Updated this week
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆81Updated last month
siyuyuan / evoagent
Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"
☆73Updated 2 months ago
LiqiangJing / DSBench
DSBench: How Far are Data Science Agents Becoming Data Science Experts?
☆20Updated this week
ytyz1307zzh / RefAug
Code for the paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"
☆30Updated 3 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆89Updated 4 months ago
clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆89Updated 4 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆39Updated 3 weeks ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆76Updated 6 months ago
Anni-Zou / Meta-CoT
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
☆84Updated 11 months ago
cambridgeltl / PairS
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…
☆34Updated 2 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆45Updated 6 months ago
Re-Align / just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
☆73Updated 7 months ago
dwzhu-pku / LongEmbed
Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"
☆108Updated 4 months ago
jiangjiechen / auction-arena
Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…
☆39Updated 7 months ago
lfsszd / CS-Drafting
Cascade Speculative Drafting
☆23Updated 5 months ago
GAIR-NLP / OlympicArena
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
☆79Updated last month
jakespringer / echo-embeddings
☆118Updated 5 months ago
lunyiliu / CoachLM
Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.
☆56Updated 6 months ago
OSU-NLP-Group / Fuxi
Repository for paper Tools Are Instrumental for Language Agents in Complex Environments
☆32Updated 8 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆55Updated last week
zjunlp / WKM
Agent Planning with World Knowledge Model
☆27Updated 2 months ago
yuxiaw / OpenFactCheck
☆31Updated 3 months ago
sail-sg / sailcraft
🚢 Data Toolkit for Sailor Language Models
☆74Updated 2 months ago
QingruZhang / PASTA
PASTA: Post-hoc Attention Steering for LLMs
☆96Updated last week
SalesforceAIResearch / FoFo
☆16Updated 6 months ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆182Updated last month
qhjqhj00 / WebBrain
☆66Updated last year
da03 / implicit_chain_of_thought
☆87Updated 3 months ago
ConiferLM / Conifer
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
☆73Updated 5 months ago