InflectionAI / Inflection-BenchmarksLinks

Public Inflection Benchmarks

☆68

Alternatives and similar repositories for Inflection-Benchmarks

Users that are interested in Inflection-Benchmarks are comparing it to the libraries listed below

Sorting:

kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆90Updated last year
IBM / ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…
☆223Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
allenai / bff
☆39Updated last year
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
ruiqi-zhong / D5
The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions
☆70Updated 2 years ago
imoneoi / multipack_sampler
Multipack distributed sampler for fast padding-free training of LLMs
☆199Updated 11 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 11 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆68Updated 3 months ago
google / sycophancy-intervention
Scripts for generating synthetic finetuning data for reducing sycophancy.
☆113Updated last year
neulab / gemini-benchmark
☆149Updated last year
JacobPfau / fillerTokens
☆67Updated last year
seonghyeonye / Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
☆116Updated last month
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆160Updated last year
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆18Updated 9 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
EleutherAI / semantic-memorization
☆44Updated 8 months ago
TristanThrush / i-am-a-strange-dataset
Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"
☆44Updated last year
Aleph-Alpha-Research / scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆64Updated 9 months ago