institutional / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆48Updated last month
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- First token cutoff sampling inference example☆30Updated 2 years ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- Python library to use Pleias-RAG models☆67Updated 8 months ago
- Tooling for exact and MinHash deduplication of large-scale text datasets☆52Updated this week
- Transformer GPU VRAM estimator☆67Updated last year
- Pivotal Token Search☆142Updated last month
- decontamination☆21Updated last month
- Train, tune, and infer Bamba model☆138Updated 7 months ago
- lossily compress representation vectors using product quantization☆59Updated 2 months ago
- ☆21Updated last year
- Code for collecting, processing, and preparing datasets for the Common Pile☆247Updated 4 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆11Updated last year
- ☆33Updated 9 months ago
- ☆59Updated last year
- Pre-train Static Word Embeddings☆94Updated 4 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- Training code for Sparse Autoencoders on Embedding models☆39Updated 10 months ago
- The AILuminate v1.1 benchmark suite is an AI risk assessment benchmark developed with broad involvement from leading AI companies, academ…☆65Updated 7 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- ☆82Updated 2 months ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Supercharge huggingface transformers with model parallelism.☆77Updated 5 months ago
- ☆92Updated last month
- ☆23Updated 11 months ago
- ☆53Updated 11 months ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆61Updated last year
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated 2 years ago