☆288Dec 3, 2024Updated last year
Alternatives and similar repositories for financebench
Users that are interested in financebench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"☆365Jun 6, 2022Updated 3 years ago
- ☆25Oct 23, 2025Updated 5 months ago
- StAtutory Reasoning Assessment☆16Dec 8, 2022Updated 3 years ago
- Research Artifact For Our Submission To VLDB☆11Oct 27, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A☆74Updated this week
- ☆22Mar 6, 2024Updated 2 years ago
- Data and code for EMNLP 2022 paper "ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering"☆120Nov 9, 2022Updated 3 years ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆237Dec 2, 2024Updated last year
- ☆15Oct 30, 2021Updated 4 years ago
- Measuring RAG solutions throughput and latency☆20Jul 23, 2024Updated last year
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- ☆43Jul 10, 2024Updated last year
- ☆16May 14, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, who…☆20Dec 7, 2022Updated 3 years ago
- This is a work in progress package that enables users to conduct fundamental financial research, utilising the SEC's EDGAR API.☆71Mar 2, 2026Updated last month
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- ☆21Oct 22, 2021Updated 4 years ago
- Comprehensive benchmark for RAG☆277Jun 14, 2025Updated 9 months ago
- Data and Code for ACL 2024 paper "DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Docu…☆23Dec 21, 2024Updated last year
- LUNA: a Framework for Language Understanding and Naturalness Assessment.☆12Sep 9, 2023Updated 2 years ago
- The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice…☆504Jul 18, 2025Updated 8 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This repository contains related work, benchmarks and datasets for the paper "Large Language Models in Finance (FinLLMs)".☆363Apr 10, 2025Updated last year
- This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning …☆847Mar 4, 2025Updated last year
- ☆14Oct 17, 2024Updated last year
- Python library containing BART query generation and BERT-based Siamese models for neural retrieval.☆40Oct 30, 2020Updated 5 years ago
- DEREK (Domain Entities and Relations Extraction Kit)☆10May 22, 2023Updated 2 years ago
- Outline to Story: Fine-grained Controllable Story Generation from Cascaded Events☆18Jun 16, 2022Updated 3 years ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- The FinEval financial domain evaluation benchmark, based on quantitative fundamental methods and developed through long-term objective re…☆266Jun 23, 2025Updated 9 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆269Mar 25, 2026Updated 2 weeks ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,135Oct 16, 2025Updated 5 months ago
- Data for paper "Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness"☆34May 3, 2023Updated 2 years ago
- code associated with WANLI dataset in Liu et al., 2022☆30May 24, 2023Updated 2 years ago
- Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".☆15Oct 14, 2022Updated 3 years ago
- ☆18Mar 25, 2024Updated 2 years ago
- This is the repo of developing reasoning models in the specific domain of financial, aim to enhance models capabilities in handling finan…☆73Jun 23, 2025Updated 9 months ago
- 天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案☆34Jul 7, 2024Updated last year