institutional / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆45Updated 2 weeks ago
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- First token cutoff sampling inference example☆31Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆53Updated 2 months ago
- Code for collecting, processing, and preparing datasets for the Common Pile☆235Updated last month
- Transformer GPU VRAM estimator☆67Updated last year
- Train, tune, and infer Bamba model☆135Updated 5 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆209Updated last week
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- Granite 3.1 Language Models☆129Updated 4 months ago
- ☆29Updated 2 years ago
- Pivotal Token Search☆131Updated 3 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated last year
- ☆20Updated last year
- Python library to use Pleias-RAG models☆64Updated 6 months ago
- Code for pre-training BabyLM baseline models.☆16Updated 2 years ago
- ☆31Updated 6 months ago
- ☆73Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆57Updated 7 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆38Updated this week
- Create embeddings for LLM using the Nomic API☆23Updated 11 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆25Updated 11 months ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Updated last year
- Concatenated documentation for use with LLMs☆47Updated this week
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated last year
- Embedding models from Jina AI☆65Updated last year
- Chunk Dedupe Estimation☆20Updated last year
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆18Updated 2 years ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆24Updated 8 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last month
- lossily compress representation vectors using product quantization☆59Updated last week
- Code and data for the Walert large language model-based chatbot☆12Updated 2 months ago