instdin / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆30Updated 2 weeks ago
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- Python library to use Pleias-RAG models☆57Updated last month
- Code for SaGe subword tokenizer (EACL 2023)☆25Updated 6 months ago
- A text-to-SQL prototype on the northwind sqlite dataset☆12Updated 9 months ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆32Updated this week
- Efficiently computing & storing token n-grams from large corpora☆24Updated 8 months ago
- ☆41Updated last week
- spaCy entry points for Curated Transformers☆31Updated 3 weeks ago
- ☆19Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆25Updated 7 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- ☆51Updated 3 weeks ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆17Updated 3 years ago
- ☆22Updated 4 months ago
- ☆14Updated 2 years ago
- Efficient BM25 with DuckDB 🦆☆49Updated 6 months ago
- Small python package to measure OCR quality and other related metrics.☆23Updated last year
- A Python micro framework for creating LLM-driven agents☆23Updated last month
- Training hybrid models for dummies.☆23Updated 5 months ago
- ☆8Updated 11 months ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆17Updated last year
- ☆15Updated 2 years ago
- NLP with Rust for Python 🦀🐍☆62Updated last month
- ☆20Updated last year
- Use Hermes-2-Pro-Mistral-7B function calling with your OpenAI API compatible code.☆18Updated last year
- History of Open-Source IR Systems☆11Updated 4 months ago
- a graph definition and execution library for python☆16Updated 2 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago
- Tree-based indexes for neural-search☆32Updated last year
- Documentation effort for the BookCorpus dataset☆34Updated 4 years ago
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆23Updated 2 years ago