moj-analytical-services / airflow-pdf2embeddingsView external linksLinks
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
☆37Jun 22, 2022Updated 3 years ago
Alternatives and similar repositories for airflow-pdf2embeddings
Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below
Sorting:
- Fully unit tested utility functions for data engineering. Python 3 only.☆18Updated this week
- NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to …☆13May 4, 2021Updated 4 years ago
- Digital Humanities course site☆21Nov 22, 2021Updated 4 years ago
- Parent repository for the MOJ Analytics Platform☆14Nov 16, 2021Updated 4 years ago
- The ONS Big Data Team Github pages☆10May 19, 2021Updated 4 years ago
- Python version of dbtools☆12Jul 30, 2025Updated 6 months ago
- A toolkit of functions and classes to help build isometric games with Lua☆16Apr 21, 2025Updated 9 months ago
- The privacy-preserving record linkage toolkit: a proof-of-concept public demo of next-gen data linkage techniques.☆15May 22, 2024Updated last year
- ☆11Jan 28, 2019Updated 7 years ago
- a tool to control SFR TV POWER edition (STB7)☆12Apr 11, 2022Updated 3 years ago
- A web application that provides a LLM powered chat experience based on GOV.UK content.☆12Updated this week
- Interactive notebooks containing demonstration code of the splink library☆40Jan 19, 2024Updated 2 years ago
- A thin wrapper around the AJV JSON Validator for Python☆12May 5, 2024Updated last year
- This is a Special Repository that contains all the list of Microweb framework Flask using this Repo you will master it in the Flask Frame…☆10Jan 27, 2021Updated 5 years ago
- A simple python library to spot holiday "bridges" and long weekends.☆10Aug 19, 2021Updated 4 years ago
- A digital edition of the 24 Probstücke of the Oberclasse by Johann Mattheson.☆11Jul 31, 2025Updated 6 months ago
- ESP32_CAM meets Etch-A-Sketch => Etch-a-Selfie☆12Jul 30, 2019Updated 6 years ago
- The standalone version of the collation editor. This version uses locally stored data files and does not require embedding in another pla…☆13Sep 15, 2025Updated 5 months ago
- Automates the tedious task of extracting crucial information from invoices with the Invoice Data Extraction Bot.☆12Feb 7, 2024Updated 2 years ago
- This project was created as an experiment to see how accurately I can generate valid regex when simply given a string(s) to match.☆11Oct 21, 2021Updated 4 years ago
- Demo of an In-database processing tool for scikit-learn☆13Oct 18, 2022Updated 3 years ago
- NGINX Dockerfiles bundled with nginx-auth-ldap☆11Oct 10, 2019Updated 6 years ago
- Automatic text comparison with an extendable variance classifier☆13Sep 11, 2023Updated 2 years ago
- ☆12Feb 21, 2022Updated 3 years ago
- Manager for remote ~/.ssh/authorized_keys☆13Mar 20, 2013Updated 12 years ago
- Surface 2channel RC transmitter. Hardware includes nRF24L01+ transceiver and ATmega328P processor with an OLED screen. Telemetry monitors…☆12Updated this week
- Pipeline for the production of digital scholarly editions of archival collections☆14Feb 22, 2024Updated last year
- Modern Data Engineering Project☆12Jun 3, 2022Updated 3 years ago
- ☆14May 30, 2023Updated 2 years ago
- ☆13Oct 18, 2023Updated 2 years ago
- textual tactics game☆10Sep 3, 2022Updated 3 years ago
- Inspect pacman log file☆17Sep 29, 2024Updated last year
- AWS Glue Configurable Test Data Generator for S3 Data Lakes and DynamoDB☆18Jan 19, 2026Updated 3 weeks ago
- FPGA implementation of SKLearn Random Forest☆10Dec 12, 2016Updated 9 years ago
- wondertrader项目源码☆12Mar 22, 2022Updated 3 years ago
- ☆13Aug 5, 2025Updated 6 months ago
- Simple sample to develop dash on gitpod☆15Jun 14, 2019Updated 6 years ago
- Data pipeline to extract and preprocess BigQuery user journey data.☆13Jun 16, 2022Updated 3 years ago
- A simple and easy-to-use two-phase flow library.☆16Jun 30, 2021Updated 4 years ago