psmedia / Books3InfoLinks

Data and information related to the Books3 dataset included as part of The Pile, and used to train Meta's LLaMA among others

☆32

Alternatives and similar repositories for Books3Info

Users that are interested in Books3Info are comparing it to the libraries listed below

Sorting:

simonw / llm-cluster
LLM plugin for clustering embeddings
☆77Updated last year
simonw / llm-mistral
LLM plugin providing access to Mistral models using the Mistral API
☆191Updated this week
neuml / txtchat
💭 Build autonomous agents, retrieval augmented generation (RAG) processes and language model powered chat applications
☆293Updated 2 months ago
Pleias / marginalia
☆67Updated last year
JohnNay / llm-lobbyist
Code for the paper: "Large Language Models as Corporate Lobbyists" (2023).
☆171Updated 2 years ago
simonw / llm-llama-cpp
LLM plugin for running models using llama.cpp
☆143Updated last year
simonw / llm-replicate
LLM plugin for models hosted on Replicate
☆63Updated last year
PublicDataWorks / verdad
https://verdad.app
☆82Updated 6 months ago
datasette / datasette-extract
Import unstructured data (text and images) into structured tables
☆153Updated 3 months ago
explosion / spacy-vscode
spaCy extension for Visual Studio Code
☆32Updated 4 months ago
deepset-ai / haystack-search-pipeline-streamlit
🚀 Template Haystack Search Application with Streamlit
☆27Updated 6 months ago
alea-institute / nupunkt
Next-generation Punkt sentence boundary detection with zero dependencies
☆17Updated 3 months ago
nomic-ai / semantic-search-app-template
Tutorial and template for a semantic search app powered by the Atlas Embedding Database, Langchain, OpenAI and FastAPI
☆115Updated last year
simonw / llm-sentence-transformers
LLM plugin for embeddings using sentence-transformers
☆70Updated 3 months ago
hamelsmu / claudesave
A Chrome extension that saves conversations with Claude to GitHubGists or your clipboard.
☆86Updated 8 months ago
deployradiant / pychatml
Chat Markup Language conversation library
☆55Updated last year
simonw / llm-gpt4all
Plugin for LLM adding support for the GPT4All collection of models
☆253Updated last year
charlesdedampierre / BunkaTopics
🗺️ Data Cleaning and Textual Data Visualization 🗺️
☆181Updated 2 months ago
i-dot-ai / redbox
Bringing Generative AI to the way the Civil Service works
☆122Updated 3 weeks ago
opening-up-chatgpt / opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…
☆118Updated 4 months ago
AnswerDotAI / web2md-ext
Get a markdown version of any webpage with a keyboard shortcut.
☆65Updated 5 months ago
deepset-ai / prompthub
☆172Updated last year
brandonrobertz / chatgpt-document-extraction
A proof of concept tool for using ChatGPT to transform messy text documents into structured JSON
☆122Updated last year
Pleias / OCRoscope
Small python package to measure OCR quality and other related metrics.
☆25Updated last year
firattamur / llmdantic
Structured Output Is All You Need!
☆58Updated last year
simonw / datasette-chatgpt-plugin
A Datasette plugin that turns a Datasette instance into a ChatGPT plugin
☆68Updated last year
cldellow / datasette-scraper
Add website scraping abilities to Datasette
☆64Updated 2 years ago
parlance-labs / langfree
Leverage your LangChain trace data for fine tuning
☆42Updated 11 months ago
menloparklab / cohere-weaviate-wikipedia-retrieval
A backend API to perform search over Wikipedia using LangChain, Cohere and Weaviate
☆105Updated 2 years ago
cfahlgren1 / hf-data-explorer
Chrome Extension for exploring Hugging Face datasets 🔎
☆50Updated 10 months ago