NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
☆37Jun 22, 2022Updated 3 years ago
Alternatives and similar repositories for airflow-pdf2embeddings
Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below
Sorting:
- Fully unit tested utility functions for data engineering. Python 3 only.☆18Feb 11, 2026Updated 3 weeks ago
- Strip output from iPython notebooks☆22Sep 6, 2015Updated 10 years ago
- Parent repository for the MOJ Analytics Platform☆14Nov 16, 2021Updated 4 years ago
- How to write functions in R☆12Updated this week
- ☆10Aug 23, 2023Updated 2 years ago
- Python version of dbtools☆12Jul 30, 2025Updated 7 months ago
- The ONS Big Data Team Github pages☆10May 19, 2021Updated 4 years ago
- Reproducible Analytical Pipeline of the Hospital Standardised Mortality Ratio (HSMR) quarterly publication☆11Jun 21, 2024Updated last year
- Interactive notebooks containing demonstration code of the splink library☆40Updated this week
- ☆11Jan 7, 2023Updated 3 years ago
- This project was created as an experiment to see how accurately I can generate valid regex when simply given a string(s) to match.☆11Oct 21, 2021Updated 4 years ago
- Surface 2channel RC transmitter. Hardware includes nRF24L01+ transceiver and ATmega328P processor with an OLED screen. Telemetry monitors…☆12Updated this week
- Code to implement the network histogram (Olhede and Wolfe, arXiv:1312.5306)☆11Sep 23, 2014Updated 11 years ago
- A simple python library to spot holiday "bridges" and long weekends.☆10Aug 19, 2021Updated 4 years ago
- Automates the tedious task of extracting crucial information from invoices with the Invoice Data Extraction Bot.☆12Feb 7, 2024Updated 2 years ago
- Cloudflare Challenge 2024☆11Apr 14, 2024Updated last year
- Manager for remote ~/.ssh/authorized_keys☆13Mar 20, 2013Updated 12 years ago
- Small algorithm for getting Antoine's coefficient to calculate vapor pressure from NIST web book.☆12May 30, 2021Updated 4 years ago
- Introductory R training☆26Feb 13, 2026Updated 3 weeks ago
- Extract structured data from free text using large language models☆17Feb 13, 2026Updated 3 weeks ago
- Demo of an In-database processing tool for scikit-learn☆13Oct 18, 2022Updated 3 years ago
- ESP32_CAM meets Etch-A-Sketch => Etch-a-Selfie☆12Jul 30, 2019Updated 6 years ago
- The privacy-preserving record linkage toolkit: a proof-of-concept public demo of next-gen data linkage techniques.☆16May 22, 2024Updated last year
- A thin wrapper around the AJV JSON Validator for Python☆12May 5, 2024Updated last year
- Ruby on rails app using aws-sdk-ruby☆12Aug 30, 2018Updated 7 years ago
- Data pipeline to extract and preprocess BigQuery user journey data.☆13Jun 16, 2022Updated 3 years ago
- FPGA implementation of SKLearn Random Forest☆10Dec 12, 2016Updated 9 years ago
- ☆13Oct 18, 2023Updated 2 years ago
- ☆14May 30, 2023Updated 2 years ago
- Inspect pacman log file☆17Sep 29, 2024Updated last year
- AWS Glue Configurable Test Data Generator for S3 Data Lakes and DynamoDB☆18Jan 19, 2026Updated last month
- Source code for predictive techniques provided in the UManSysProp facility.☆12May 15, 2024Updated last year
- Reproduce an Economist graph found on the article: [Safe Skies]☆11Sep 11, 2018Updated 7 years ago
- This converts most PDF files into a text only PDF file. This script strips a PDF document of all images, designs, etc and only keeps the …☆11Feb 20, 2024Updated 2 years ago
- textual tactics game☆10Sep 3, 2022Updated 3 years ago
- Text anonymization in many languages using Faker☆10Mar 31, 2020Updated 5 years ago
- Converting ArchLinux ARM OS for Berryboot☆15Sep 26, 2023Updated 2 years ago
- Working paper and notebook for unsupervised document clustering☆13Mar 6, 2018Updated 8 years ago
- How to set up raspberry pi as a portable oldies gaming console☆16Oct 25, 2019Updated 6 years ago