allenai / wimbdLinks

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

☆223

Alternatives and similar repositories for wimbd

Users that are interested in wimbd are comparing it to the libraries listed below

Sorting:

AlexTMallen / adaptive-retrieval
☆189Updated 3 months ago
chaitanyamalaviya / ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
☆133Updated last year
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆261Updated last year
allenai / catwalk
This project studies the performance and robustness of language models and task-adaptation methods.
☆154Updated last year
nyu-mll / quality
☆141Updated 9 months ago
nlp-uoregon / mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
☆132Updated last year
nelson-liu / lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
☆360Updated last year
realtimeqa / realtimeqa_public
☆78Updated last year
facebookresearch / dpr-scale
Scalable training for dense retrieval models.
☆297Updated 4 months ago
bigscience-workshop / lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
☆104Updated 2 years ago
lilakk / BooookScore
A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…
☆127Updated last year
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆117Updated 2 years ago
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 3 months ago
kamalkraj / e5-mistral-7b-instruct
Finetune mistral-7b-instruct for sentence embeddings
☆86Updated last year
xlang-ai / BRIGHT
[ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
☆168Updated last month
ParticleMedia / RAGTruth
Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"
☆204Updated 10 months ago
yizhongw / Tk-Instruct
Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
☆181Updated 2 years ago
google-deepmind / loft
LOFT: A 1 Million+ Token Long-Context Benchmark
☆218Updated 4 months ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆256Updated last week
princeton-nlp / AutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆314Updated last year
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆148Updated 11 months ago
princeton-nlp / HELMET
The HELMET Benchmark
☆177Updated 2 months ago
sileod / tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
☆188Updated 3 months ago
orionw / FollowIR
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
☆48Updated last year
facebookresearch / tart
Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.
☆165Updated 2 years ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆242Updated 11 months ago
yuxiaw / Factcheck-GPT
Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.
☆105Updated last year
DaoD / INTERS
This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"
☆204Updated 10 months ago
asaparov / prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆150Updated last month
nkandpa2 / long_tail_knowledge
Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"
☆78Updated 2 years ago