institutional / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆45Updated this week
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- Transformer GPU VRAM estimator☆66Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆52Updated last month
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 10 months ago
- First token cutoff sampling inference example☆31Updated last year
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆35Updated this week
- ☆28Updated 5 months ago
- The official evaluation suite and dynamic data release for MixEval.☆11Updated last year
- lossily compress representation vectors using product quantization☆59Updated 5 months ago
- Pivotal Token Search☆125Updated 2 months ago
- Python library to use Pleias-RAG models☆62Updated 4 months ago
- ☆29Updated 2 years ago
- Public repository containing METR's DVC pipeline for eval data analysis☆110Updated 5 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated last year
- Efficiently computing & storing token n-grams from large corpora☆26Updated 11 months ago
- ☆20Updated 11 months ago
- Your buddy in the (L)LM space.☆64Updated last year
- Train, tune, and infer Bamba model☆132Updated 3 months ago
- LLM plugin for clustering embeddings☆82Updated last year
- An introduction to DSPy☆32Updated 3 weeks ago
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆87Updated this week
- Vector Database with support for late interaction and token level embeddings.☆55Updated 3 months ago
- Code for pre-training BabyLM baseline models.☆16Updated 2 years ago
- Training code for Sparse Autoencoders on Embedding models☆38Updated 6 months ago
- Embedding models from Jina AI☆65Updated last year
- Code for collecting, processing, and preparing datasets for the Common Pile☆227Updated last week
- Pre-train Static Word Embeddings☆85Updated 2 weeks ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆48Updated this week
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆55Updated 6 months ago