recursiveai / flow_benchmark_toolsLinks

☆2

Alternatives and similar repositories for flow_benchmark_tools

Users that are interested in flow_benchmark_tools are comparing it to the libraries listed below

Sorting:

stanfordnlp / pyvene
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆770Updated this week
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,495Updated 5 months ago
carlini / yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
☆1,021Updated 2 months ago
callummcdougall / ARENA_3.0
☆617Updated this week
trotsky1997 / MathBlackBox
☆1,027Updated 7 months ago
alan-cooney / transformer-from-scratch
Decoder only transformer, built from scratch with PyTorch
☆30Updated last year
fmind / mlops-python-package
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
☆1,314Updated this week
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆1,934Updated 11 months ago
aai-institute / beyond-jupyter
Software design principles for machine learning applications
☆361Updated 3 months ago
EdinburghNLP / awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
☆916Updated last month
dcai-course / dcai-lab
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻
☆463Updated 4 months ago
ml-stat-Sustech / TorchCP
A Python toolbox for conformal prediction research on deep learning models, using PyTorch.
☆403Updated this week
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆563Updated last year
openai / mle-bench
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
☆800Updated last month
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆806Updated this week
Aleph-Alpha-Research / AtMan
☆30Updated 2 months ago
JShollaj / awesome-llm-interpretability
A curated list of Large Language Model (LLM) Interpretability resources.
☆1,382Updated 3 weeks ago
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆649Updated last year
science-of-finetuning / dictionary_learning
Modified to support crosscoder training.
☆20Updated 2 weeks ago
jbloomAus / SAELens
Training Sparse Autoencoders on Language Models
☆876Updated this week
stanford-futuredata / ARES
Automated Evaluation of RAG Systems
☆631Updated 3 months ago
UKGovernmentBEIS / inspect_ai
Inspect: A framework for large language model evaluations
☆1,145Updated this week
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆963Updated 2 months ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆450Updated 4 months ago
microsoft / aurora
Implementation of the Aurora model for Earth system forecasting
☆654Updated 3 weeks ago
andyzoujm / representation-engineering
Representation Engineering: A Top-Down Approach to AI Transparency
☆851Updated 11 months ago
SakanaAI / evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
☆1,349Updated 7 months ago
aangelopoulos / conformal-prediction
Lightweight, useful implementation of conformal prediction on real data.
☆912Updated last month
FourthBrain / Building-with-Instruction-Tuned-LLMs-A-Step-by-Step-Guide
Resources relating to the DLAI event: https://www.youtube.com/watch?v=eTieetk2dSw
☆186Updated 2 years ago
unlearning-challenge / starting-kit
Starting kit for the NeurIPS 2023 unlearning challenge
☆378Updated last year