Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
☆29Apr 7, 2023Updated 3 years ago
Alternatives and similar repositories for community
Users that are interested in community are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19May 23, 2023Updated 2 years ago
- Preprocessing pipeline notebooks and API supporting text extraction from SEC documents☆151Jan 1, 2024Updated 2 years ago
- 💙 Unstructured Data Connectors for Haystack 2.0☆17Sep 21, 2023Updated 2 years ago
- Code created for blog series on unsupervised feature/topic extraction from corporate email content. An implementation for cleaning raw e…☆10Oct 21, 2021Updated 4 years ago
- This project is a Python script that scrapes a Linkedin PDF, generates a customized portfolio site using OpenAI's GPT-4 model with the he…☆29May 16, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Using LLMs to manage files and generating metadata such as tags and summaries.☆17Apr 11, 2025Updated last year
- ☆915Apr 22, 2026Updated last week
- A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.☆44Jan 13, 2019Updated 7 years ago
- An assortment of Obsidian Web Clipper Templates☆29Mar 14, 2025Updated last year
- Using ChatGPT to build a Kedro ML pipeline and Streamlit frontend☆30Feb 11, 2023Updated 3 years ago
- Chrome and Firefox extensions for Slurp☆28Apr 9, 2024Updated 2 years ago
- ☆29May 27, 2025Updated 11 months ago
- Model Context Protocol Server for Accessing twitter☆23Jun 3, 2025Updated 10 months ago
- This is a MCP (Model Context Protocol) server that you can use with Cline through Visual Studio Code and ask songs to be played using You…☆21Feb 2, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Tiptap extension for adding embedded content with Iframely.☆16Nov 18, 2025Updated 5 months ago
- This guide is made to help you deploy your own document RAG pipline with Open-WebUI and Local LLM.☆38Mar 20, 2025Updated last year
- Prompt templating tools designed for interacting with language interfaces like OpenAI's ChatGPT in Obsidian.☆25Apr 3, 2024Updated 2 years ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Aug 15, 2024Updated last year
- A Model Context Protocol (MCP) server that integrates with X using the @elizaOS `agent-twitter-client` package, allowing AI models to int…☆28Mar 30, 2026Updated last month
- A Python tool that uses AI to generate well-structured technical and educational articles from any topic. Features transparent reasoning,…☆18Apr 19, 2025Updated last year
- Batch processing using joblib including tqdm progress bars☆20Dec 29, 2021Updated 4 years ago
- ☆15Jan 10, 2025Updated last year
- The official Python library for Formulaic☆18Apr 25, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The AI-first datastore & retrieval engine.☆36Oct 29, 2024Updated last year
- A template for Python projects that need to use a relational database, including tooling for managing schema migrations and testing again…☆13Dec 13, 2024Updated last year
- Copy the web as markdown☆41Aug 17, 2025Updated 8 months ago
- This repository contains C++ implementation of A* search algorithm for finding path to goal state for 8 puzzle problem in AI.☆11Dec 2, 2023Updated 2 years ago
- A project designed to extract relevant metadata from databases and transform it into context for Retrieval-Augmented Generation (RAG) in …☆14Aug 6, 2025Updated 8 months ago
- This is a template retrieval repo to create a Flask api server using LangChain with Cohere embeddings and Qdrant Vector Database☆78Apr 30, 2023Updated 3 years ago
- Build Contact Form 7 forms from PDF forms. Get PDFs auto-filled and attached to email messages and/or website responses on form submissio…☆11Apr 2, 2026Updated last month
- ☆35Jun 22, 2024Updated last year
- Prompting Techniques for Attorneys☆15Apr 1, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Injection of MSIL using Cecil☆12Jul 28, 2015Updated 10 years ago
- API client for fetching and comparing passages from legislation☆14Jan 26, 2025Updated last year
- I will be adding different kind of opensource data extraction tools code using python☆10Nov 15, 2024Updated last year
- ☆15Jun 9, 2023Updated 2 years ago
- A lightweight React hook that automatically manages fade overlays for scrollable containers. Provides smooth gradient transitions at the …☆12Aug 11, 2025Updated 8 months ago
- ☆14Jul 25, 2024Updated last year
- ☆23Apr 6, 2026Updated 3 weeks ago