nexusflowai / NexusBenchLinks

Nexusflow function call, tool use, and agent benchmarks.

☆29

Alternatives and similar repositories for NexusBench

Users that are interested in NexusBench are comparing it to the libraries listed below

Sorting:

Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆65Updated last year
arcee-ai / DAM
☆55Updated 11 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆58Updated this week
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆27Updated 11 months ago
jina-ai / jina-vdr
Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval
☆30Updated 2 months ago
QuixiAI / kraken
☆67Updated last year
SqueezeBits / GraLoRA
☆23Updated 2 weeks ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 9 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 10 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆76Updated 6 months ago
Zoeyyao27 / SirLLM
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆60Updated last year
TIGER-AI-Lab / One-Shot-CFT
The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]
☆32Updated last month
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆97Updated this week
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated 11 months ago
ZihanWang314 / coeCheck
☆19Updated 7 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆79Updated 7 months ago
sunblaze-ucb / AgentSynth
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
☆32Updated 2 weeks ago
SebastianBodza / EnsembleForecasting
Using multiple LLMs for ensemble Forecasting
☆16Updated last year
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 6 months ago
brendanhogan / completion_tree_view
☆14Updated 5 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
brendanhogan / picoDeepResearch
☆68Updated 5 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆30Updated 4 months ago
allenai / olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆50Updated this week
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆93Updated 5 months ago
XiaoduoAILab / XmodelLM
XmodelLM
☆39Updated 11 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
padas-lab-de / ir-rag-sigir24-persona-rag
☆50Updated last year
SLIT-AI / FuseChat-3.0
☆18Updated 6 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆108Updated 4 months ago