instdin / institutional-books-1-pipelineLinks
The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.
☆41Updated last month
Alternatives and similar repositories for institutional-books-1-pipeline
Users that are interested in institutional-books-1-pipeline are comparing it to the libraries listed below
Sorting:
- Training hybrid models for dummies.☆25Updated 6 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- ☆25Updated 3 months ago
- Python library to use Pleias-RAG models☆58Updated 2 months ago
- ☆22Updated 5 months ago
- ☆20Updated 9 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 8 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆24Updated 7 months ago
- LLM plugin for clustering embeddings☆77Updated last year
- LLMs sitting on a council together to decide, by consensus, who among them is the best.☆15Updated this week
- ☆54Updated last month
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆24Updated 2 weeks ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆36Updated this week
- ☆67Updated last year
- LLM plugin for models hosted by Anyscale Endpoints☆33Updated last year
- Code for "Training-free Graph Neural Networks and the Power of Labels as Features" (TMLR 2024)☆58Updated 11 months ago
- The repository contains generative AI analytics platform application code.☆26Updated 2 months ago
- Flow Chart Image-to-Code Generation☆33Updated last year
- Hugging Face and Pyserini interoperability☆20Updated 2 years ago
- ☆21Updated last year
- Train, tune, and infer Bamba model☆130Updated last month
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆49Updated this week
- Efficiently computing & storing token n-grams from large corpora☆24Updated 9 months ago
- Small python package to measure OCR quality and other related metrics.☆24Updated last year
- Scripts to load the GDELT data set into MongoDB☆12Updated 2 years ago
- A CLI tool for managing OpenAI batch processing jobs with ease.☆37Updated 2 months ago
- ☆47Updated last month
- Tree-based indexes for neural-search☆32Updated last year
- Implementation☆25Updated 3 months ago