institutional / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆47Updated last month
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- Code for collecting, processing, and preparing datasets for the Common Pile☆248Updated 3 months ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- Python library to use Pleias-RAG models☆67Updated 7 months ago
- lossily compress representation vectors using product quantization☆59Updated last month
- First token cutoff sampling inference example☆31Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- Transformer GPU VRAM estimator☆67Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆164Updated 8 months ago
- Pivotal Token Search☆141Updated last week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- LLM plugin for clustering embeddings☆82Updated last year
- Embedding models from Jina AI☆65Updated last year
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Updated 9 months ago
- An introduction to LLM Sampling☆79Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated last year
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…☆67Updated 3 weeks ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆40Updated 3 weeks ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- Tooling for exact and MinHash deduplication of large-scale text datasets☆46Updated last week
- Granite 3.1 Language Models☆135Updated 6 months ago
- Training code for Sparse Autoencoders on Embedding models☆39Updated 10 months ago
- Create embeddings for LLM using the Nomic API☆23Updated last year
- Public reports detailing responses to sets of prompts by Large Language Models.☆32Updated 11 months ago
- Code for pre-training BabyLM baseline models.☆16Updated 2 years ago
- A massively multilingual modern encoder language model☆117Updated 2 months ago
- ☆62Updated 5 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated last year