ComposioHQ / Composio-Function-Calling-BenchmarkLinks

Function Calling Benchmark & Testing

☆88

Alternatives and similar repositories for Composio-Function-Calling-Benchmark

Users that are interested in Composio-Function-Calling-Benchmark are comparing it to the libraries listed below

Sorting:

migtissera / Sensei
Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI
☆222Updated last year
redotvideo / pluto
Synthetic Data for LLM Fine-Tuning
☆120Updated last year
teknium1 / ShareGPT-Builder
☆115Updated 7 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 8 months ago
writer / writing-in-the-margins
☆118Updated 11 months ago
louisbrulenaudet / ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆66Updated 8 months ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 3 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆173Updated 10 months ago
QuixiAI / OpenChatML
☆157Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
h2oai / enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
☆87Updated last month
michaelfeil / embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
☆45Updated 10 months ago
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆76Updated 9 months ago
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆87Updated 10 months ago
kerekovskik / autologic
autologic is a Python package that implements the SELF-DISCOVER framework proposed in the paper SELF-DISCOVER: Large Language Models Self…
☆60Updated last year
apple / ml-superposition-prompting
☆145Updated last year
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆175Updated last year
vaughanlove / PromptBreeder
Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.
☆129Updated 11 months ago
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆276Updated last year
argilla-io / notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…
☆168Updated last year
marcusschiesser / open-swarm
Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework.
☆92Updated 9 months ago
QuixiAI / kraken
☆66Updated last year
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆99Updated 9 months ago
interstellarninja / function-calling-eval
A framework for evaluating function calls made by LLMs
☆37Updated last year
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆63Updated 11 months ago
angelina-yang / Claude_API_Contest
Claude API Test Project
☆87Updated last year
khaimt / qa_expert
This repo is for handling Question Answering, especially for Multi-hop Question Answering
☆67Updated last year
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year