instdin / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆44Updated 2 months ago
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- Transformer GPU VRAM estimator☆66Updated last year
- First token cutoff sampling inference example☆30Updated last year
- Real-time visualisation☆18Updated 2 weeks ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆51Updated this week
- Pivotal Token Search☆119Updated 3 weeks ago
- Python library to use Pleias-RAG models☆61Updated 3 months ago
- LLM plugin for clustering embeddings☆80Updated last year
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆50Updated 10 months ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated 10 months ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Your buddy in the (L)LM space.☆64Updated 10 months ago
- Granite 3.1 Language Models☆117Updated last month
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 9 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆31Updated this week
- Blueprint by Mozilla.ai for answering questions about structured documents☆38Updated 4 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆23Updated 5 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 5 months ago
- Embedding models from Jina AI☆64Updated last year
- Vector Database with support for late interaction and token level embeddings.☆55Updated last month
- Concatenated documentation for use with LLMs☆41Updated this week
- ☆21Updated last year
- Flow Chart Image-to-Code Generation☆33Updated 2 years ago
- Geniusrise: Framework for building geniuses☆60Updated last year
- Code for collecting, processing, and preparing datasets for the Common Pile☆216Updated 2 weeks ago
- PyLate efficient inference engine☆62Updated 3 weeks ago
- Git scrapers for scraping the fediverse☆17Updated this week
- A simple github actions script to build a llamafile and uploads to huggingface☆15Updated last year
- Pre-train Static Word Embeddings☆85Updated 2 months ago
- WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata☆97Updated 9 months ago