recursiveai / flow_benchmark_toolsLinks
☆2Updated 3 weeks ago
Alternatives and similar repositories for flow_benchmark_tools
Users that are interested in flow_benchmark_tools are comparing it to the libraries listed below
Sorting:
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆770Updated this week
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,495Updated 5 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆1,021Updated 2 months ago
- ☆617Updated this week
- ☆1,027Updated 7 months ago
- Decoder only transformer, built from scratch with PyTorch☆30Updated last year
- Kickstart your MLOps initiative with a flexible, robust, and productive Python package.☆1,314Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,934Updated 11 months ago
- Software design principles for machine learning applications☆361Updated 3 months ago
- List of papers on hallucination detection in LLMs.☆916Updated last month
- Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽💻☆463Updated 4 months ago
- A Python toolbox for conformal prediction research on deep learning models, using PyTorch.☆403Updated this week
- Best practices for distilling large language models.☆563Updated last year
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆800Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆806Updated this week
- ☆30Updated 2 months ago
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,382Updated 3 weeks ago
- Automatically evaluate your LLMs in Google Colab☆649Updated last year
- Modified to support crosscoder training.☆20Updated 2 weeks ago
- Training Sparse Autoencoders on Language Models☆876Updated this week
- Automated Evaluation of RAG Systems☆631Updated 3 months ago
- Inspect: A framework for large language model evaluations☆1,145Updated this week
- Evaluate your LLM's response with Prometheus and GPT4 💯☆963Updated 2 months ago
- Best practices & guides on how to write distributed pytorch training code☆450Updated 4 months ago
- Implementation of the Aurora model for Earth system forecasting☆654Updated 3 weeks ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆851Updated 11 months ago
- Official repository of Evolutionary Optimization of Model Merging Recipes☆1,349Updated 7 months ago
- Lightweight, useful implementation of conformal prediction on real data.☆912Updated last month
- Resources relating to the DLAI event: https://www.youtube.com/watch?v=eTieetk2dSw☆186Updated 2 years ago
- Starting kit for the NeurIPS 2023 unlearning challenge☆378Updated last year