stanford-crfm / fmti
The Foundation Model Transparency Index
β65Updated 3 months ago
Related projects: β
- This is the reproduction repository for my π€ Hugging Face blog post on synthetic dataβ57Updated 7 months ago
- β184Updated last week
- Functional Benchmarks and the Reasoning Gapβ74Updated last month
- Let's build better datasets, together!β195Updated last month
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for trβ¦β38Updated 3 weeks ago
- β91Updated last month
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ99Updated 8 months ago
- β75Updated 3 weeks ago
- β57Updated 5 months ago
- β68Updated last month
- Codebase accompanying the Summary of a Haystack paper.β65Updated 2 months ago
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ92Updated 11 months ago
- ReLM is a Regular Expression engine for Language Modelsβ100Updated last year
- Automating enterprise workflows with multimodal agentsβ83Updated last month
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β81Updated last year
- β37Updated this week
- Evaluating LLMs with CommonGen-Liteβ83Updated 6 months ago
- β256Updated this week
- A set of scripts and notebooks on LLM finetunning and dataset creationβ89Updated last week
- β85Updated 7 months ago
- π A curated list of papers & technical articles on AI Quality & Safetyβ155Updated 11 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.β50Updated this week
- Just a bunch of benchmark logs for different LLMsβ112Updated last month
- Scrape and export data from the Open LLM Leaderboard.β38Updated 2 weeks ago
- Small and Efficient Mathematical Reasoning LLMsβ69Updated 7 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).β73Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β48Updated 2 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ217Updated 2 months ago
- β71Updated 3 months ago
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuningβ40Updated 9 months ago