π§ Compare how Agent systems perform on several benchmarks. ππ
β103Aug 4, 2025Updated 7 months ago
Alternatives and similar repositories for agent_reasoning_benchmark
Users that are interested in agent_reasoning_benchmark are comparing it to the libraries listed below
Sorting:
- β18Feb 28, 2026Updated last week
- RAG-Fusion implementation using Langchain, Weaviate and OpenAIβ13Oct 31, 2023Updated 2 years ago
- β15Jan 19, 2023Updated 3 years ago
- SpatialTypes functions for extending PyPika with GISβ10May 16, 2022Updated 3 years ago
- β18Jun 26, 2024Updated last year
- A sample pattern for running CI tests on Modalβ19Apr 12, 2025Updated 10 months ago
- Set of PyTorch modules for developing and evaluating different algorithms for embedding trees.β22Dec 22, 2021Updated 4 years ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPOβ116Dec 30, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β19Jul 20, 2023Updated 2 years ago
- Agent computer interface for AI software engineer.β118Feb 27, 2026Updated last week
- Summary of system papers/frameworks/codes/tools on training or serving large modelβ57Dec 17, 2023Updated 2 years ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ25Feb 21, 2025Updated last year
- β97Dec 16, 2024Updated last year
- My Implementation of " Structure and Content-Guided Video Synthesis with Diffusion Models" by RunwayMLβ26Jan 16, 2024Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)β27Nov 30, 2024Updated last year
- Materials for ConceptARC paperβ115Feb 10, 2026Updated 3 weeks ago
- Ingest PDFs into Weaviateβ33Jun 14, 2024Updated last year
- π± Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMsβ71Mar 21, 2025Updated 11 months ago
- π Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)β25Oct 15, 2023Updated 2 years ago
- Instant voice cloning by MyShell.β26Apr 28, 2024Updated last year
- ιεΊΈεε€ιΎδΉι΄ηζζ¬ι£ζ Όθ½¬ζ’β26Aug 2, 2022Updated 3 years ago
- Sets up ComfyUI on MacOS/Linux/Windows and runs a workflow json.β32May 7, 2025Updated 10 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology Viewβ120Jun 6, 2025Updated 9 months ago
- Code and data for "Retrieval Enhanced Model for Commonsense Generation" (ACL-IJCNLP 2021).β29Dec 31, 2021Updated 4 years ago
- β125Aug 13, 2024Updated last year
- code associated with WANLI dataset in Liu et al., 2022β31May 24, 2023Updated 2 years ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.β217Apr 15, 2025Updated 10 months ago
- OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environmentsβ14Oct 25, 2023Updated 2 years ago
- β39Jul 25, 2024Updated last year
- β12Nov 3, 2024Updated last year
- Leveraging Base Language Models for Few-Shot Synthetic Data Generationβ40Oct 18, 2025Updated 4 months ago
- A PyTorch-based model pruning toolkit for pre-trained language modelsβ388Aug 31, 2023Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ34Mar 21, 2024Updated last year
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuningβ35Aug 9, 2023Updated 2 years ago
- β78Dec 26, 2023Updated 2 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]β148Nov 26, 2024Updated last year
- EMNLP 2022: ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarizationβ36Jan 13, 2024Updated 2 years ago
- β37Dec 6, 2024Updated last year
- A modern audio editor with multitrack capabilities, enhanced waveform visualization, and an intuitive, sleek interface.β17Aug 12, 2025Updated 6 months ago