NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
☆37Jun 22, 2022Updated 3 years ago
Alternatives and similar repositories for airflow-pdf2embeddings
Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fully unit tested utility functions for data engineering. Python 3 only.☆18Mar 12, 2026Updated last week
- Digital Humanities course site☆21Nov 22, 2021Updated 4 years ago
- ☆10Dec 17, 2020Updated 5 years ago
- A digital edition of the 24 Probstücke of the Oberclasse by Johann Mattheson.☆11Mar 13, 2026Updated last week
- Extract structured data from free text using large language models☆18Mar 12, 2026Updated last week
- The standalone version of the collation editor. This version uses locally stored data files and does not require embedding in another pla…☆13Sep 15, 2025Updated 6 months ago
- ☆16Jan 1, 2020Updated 6 years ago
- The homework assignments finished for the coursera specialization "Probabilistic Graphical Models"☆13Jun 16, 2017Updated 8 years ago
- Sara - the Rasa Demo Bot: An example of a contextual AI assistant built with the open source Rasa Stack☆11Jan 14, 2021Updated 5 years ago
- Framework for Oxygen XML Author for Digital Scholarly Editions☆14May 23, 2025Updated 10 months ago
- Web application that powers weber-gesamtausgabe.de☆24Updated this week
- Auto-tag govuk content to the collated legacy taxonomies☆21Sep 16, 2021Updated 4 years ago
- MoJ coffee and coding sessions that can be made publicly available☆26May 24, 2021Updated 4 years ago
- A plugin that provides support for working with Digital Facsimiles in Text Encoding Initiative (TEI) vocabulary. The plugin contribute…☆25Jun 16, 2025Updated 9 months ago
- Project that details the creation of a Spark Cluster using Raspberry Pi 4 and Ubuntu Server LTS 20.04☆31Oct 9, 2020Updated 5 years ago
- ☆10Aug 23, 2023Updated 2 years ago
- Word Factor Vectors☆32Dec 13, 2019Updated 6 years ago
- A collection of notebooks for Natural Language Processing☆25Jan 13, 2025Updated last year
- The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities …☆30May 23, 2020Updated 5 years ago
- ☆12Dec 23, 2022Updated 3 years ago
- Manager for remote ~/.ssh/authorized_keys☆13Mar 20, 2013Updated 13 years ago
- ☆18Aug 29, 2021Updated 4 years ago
- An extensive game simulator and animated visualizer for 2D battles drawn with inspiration from Totally Accurate Battle Simulator (TABS). …☆31Aug 18, 2024Updated last year
- ARCHIVED Generate Code from BNF Grammars☆12May 10, 2022Updated 3 years ago
- Simple sample to develop dash on gitpod☆15Jun 14, 2019Updated 6 years ago
- Binary Python bindings for poppler utils for content extraction☆42May 12, 2021Updated 4 years ago
- Code for Single-step Retrosynthesis model Retroprime☆40Apr 27, 2021Updated 4 years ago
- EasyQuery.JS samples for various server-side platforms: NodeJs, PHP, Java☆10Dec 12, 2022Updated 3 years ago
- A repository to demonstrate how ChatGPT writes an entire AI application on AWS.☆35Dec 2, 2022Updated 3 years ago
- ☆13Aug 5, 2025Updated 7 months ago
- ☆11Jan 28, 2019Updated 7 years ago
- Applying deep neural networks for retrosynthesis tasks☆37Mar 2, 2020Updated 6 years ago
- Python package for creating Sankey flow diagrams in Matplotlib☆31Oct 23, 2024Updated last year
- The ONS Big Data Team Github pages☆10May 19, 2021Updated 4 years ago
- ☆16Sep 27, 2024Updated last year
- A digital humanities operating system that runs on a USB disk.☆32Jul 5, 2017Updated 8 years ago
- Repo for MCMC based Dynamic Topic Model☆16Sep 2, 2017Updated 8 years ago
- NGINX Dockerfiles bundled with nginx-auth-ldap☆11Oct 10, 2019Updated 6 years ago
- Reproducible Analytical Pipeline of the Hospital Standardised Mortality Ratio (HSMR) quarterly publication☆11Jun 21, 2024Updated last year