allenai/fluid-benchmarking

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/fluid-benchmarking)

allenai / fluid-benchmarking

Fluid Language Model Benchmarking

☆29

Alternatives and similar repositories for fluid-benchmarking

Users that are interested in fluid-benchmarking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
Zhiyuan-Zeng / EvalTree
View on GitHub
[COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
☆31Jul 11, 2025Updated last year
interview-eval / interview-eval
View on GitHub
Interview-based evaluation of LLMs
☆30May 21, 2026Updated 2 months ago
SALT-NLP / multi-value
View on GitHub
Complete set of English dialect transformation rules and evaluation code
☆16Jun 7, 2024Updated 2 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cisnlp / bias-in-nlp
View on GitHub
Literature overview: gender bias in natural language processing
☆12Jan 26, 2021Updated 5 years ago
FLAIR-IISc / NLP-Reading-Group
View on GitHub
Everything related to the reading group.
☆10Oct 29, 2025Updated 8 months ago
schwartz-lab-NLP / Tokens2Words
View on GitHub
☆16Apr 2, 2025Updated last year
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
yaof20 / verl
View on GitHub
verl: Volcano Engine Reinforcement Learning for LLMs
☆22Nov 6, 2025Updated 8 months ago
myracheng / lm_caricature
View on GitHub
code and data associated with CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
☆11Oct 13, 2023Updated 2 years ago
valentinhofmann / politosphere
View on GitHub
☆19Jun 7, 2022Updated 4 years ago
allenai / agent-baselines
View on GitHub
☆150Updated this week
ursidsn / momoshami
View on GitHub
Esolang inspired by The Demon Girl Next Door(まちカドまぞく)
☆12Apr 17, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
writing-assistant / writing-assistant.github.io
View on GitHub
☆18Sep 3, 2024Updated last year
arjundevraj / stragglar
View on GitHub
☆15Oct 2, 2025Updated 9 months ago
paul-rottger / issuebench
View on GitHub
Röttger et al. (2024): "IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance"
☆17Mar 6, 2026Updated 4 months ago
cofe-ai / Mu-scaling
View on GitHub
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Jul 17, 2023Updated 3 years ago
aradha / deep_neural_feature_ansatz
View on GitHub
Code for verifying deep neural feature ansatz
☆22May 3, 2023Updated 3 years ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
Infini-AI-Lab / STEM
View on GitHub
☆66May 7, 2026Updated 2 months ago
robjsliwa / pyprolog
View on GitHub
Prolog implemented in Python
☆12Sep 6, 2024Updated last year
jennhu / lm-pragmatics
View on GitHub
Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"
☆11Dec 14, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lighttransport / jagger-python
View on GitHub
Python binding for Jagger(C++ implementation of Pattern-based Japanese Morphological Analyzer)
☆13Dec 16, 2025Updated 7 months ago
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
kyle8581 / DialogueCoT
View on GitHub
[EMNLP 2023] Official repository for Dialogue Chain-of-Thought Distillation (DONUT & DOCTOR)
☆11Nov 15, 2023Updated 2 years ago
varshakishore / IncDSI
View on GitHub
☆11Sep 10, 2023Updated 2 years ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
wise-east / spolin
View on GitHub
Repo for SPOLIN corpus and paper "Grounding Conversations with Improvised Dialogues" (ACL2020)
☆14Feb 20, 2026Updated 5 months ago
georgehc / mnar_mc
View on GitHub
☆12Nov 2, 2021Updated 4 years ago
ant-research / M2-Miner
View on GitHub
[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
☆55Apr 22, 2026Updated 3 months ago
recursal / GoldFinch-paper
View on GitHub
GoldFinch and other hybrid transformer components
☆46Jul 20, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IBM / benchbench
View on GitHub
A package dedicated for running benchmark agreement testing
☆19Sep 18, 2025Updated 10 months ago
weilicao / SPScanner
View on GitHub
[COLM '25] Single-Pass Document Scanning for Question Answering
☆14Aug 20, 2025Updated 11 months ago
junya-takayama / DIRECT
View on GitHub
DIRECT: Direct and Indirect REsponses in Conversational Text Corpus
☆17Jul 1, 2021Updated 5 years ago
goombalab / raven
View on GitHub
☆78May 29, 2026Updated last month
allenai / asta-paper-finder
View on GitHub
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
☆245Mar 17, 2026Updated 4 months ago
LauraRuis / do-pigs-fly
View on GitHub
☆22Oct 22, 2023Updated 2 years ago
allenai / AskOlmo
View on GitHub
☆15Nov 19, 2025Updated 8 months ago