docugami / DFM-benchmarks
Benchmarks for Business Document Foundation Models
β10Updated 9 months ago
Alternatives and similar repositories for DFM-benchmarks:
Users that are interested in DFM-benchmarks are comparing it to the libraries listed below
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agentsβ23Updated 2 years ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM datasetβ14Updated 10 months ago
- Explore the use of DSPy for extracting features from PDFs πβ37Updated 10 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β45Updated last year
- β43Updated 3 months ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classificationβ11Updated last year
- β16Updated this week
- Self Organizing Maps (SOM) ML model can be used to conduct semantic search to populate context required for Retrieval Augmented Generatioβ¦β16Updated 10 months ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasetsβ16Updated last year
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).β24Updated last year
- Based on the tree of thoughts paperβ46Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisionsβ27Updated last year
- β19Updated 2 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"β23Updated this week
- Tools for merging pretrained large language models.β19Updated 7 months ago
- Code for NeurIPS LLM Efficiency Challengeβ54Updated 9 months ago
- β13Updated last year
- Entailment self-trainingβ25Updated last year
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrievalβ14Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β74Updated 2 months ago
- β24Updated last year
- Text clustering: HDBSCAN is probably all you need.β18Updated last year
- Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)β13Updated last year
- β14Updated 3 months ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchiβ¦β31Updated 7 months ago
- Measuring RAG solutions throughput and latencyβ15Updated 5 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β26Updated 3 weeks ago
- β8Updated 6 months ago
- Using short models to classify long textsβ21Updated last year