moj-analytical-services/airflow-pdf2embeddings

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/moj-analytical-services/airflow-pdf2embeddings)

moj-analytical-services / airflow-pdf2embeddings

NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.

☆37

Alternatives and similar repositories for airflow-pdf2embeddings

Users that are interested in airflow-pdf2embeddings are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

moj-analytical-services / splink_demos
View on GitHub
Interactive notebooks containing demonstration code of the splink library
☆41Updated this week
alphagov / govuk-chat
View on GitHub
A web application that provides a LLM powered chat experience based on GOV.UK content.
☆13Updated this week
moj-analytical-services / pydbtools
View on GitHub
Python version of dbtools
☆12Jul 30, 2025Updated 9 months ago
ejklike / tied-twoway-transformer
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
Tentex1 / ADBFastbootGUI
View on GitHub
It is a project designed to make ADB(Android Debug Bridge) and its Fastboot element easier to use with a graphical interface.
☆30Mar 13, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zengtsysu / BioNavi
View on GitHub
☆12Jun 11, 2025Updated 10 months ago
dfe-analytical-services / dfeR
View on GitHub
R package for common Department for Education analysis tasks
☆14Updated this week
brandonko / HTML-Data-Cleaning-Python-NLP
View on GitHub
Jupyter notebook that contains the workflow for cleaning scraped HTML sites for NLP in Python
☆10Sep 3, 2020Updated 5 years ago
moj-analytical-services / laurium
View on GitHub
Extract structured data from free text using large language models
☆19Apr 28, 2026Updated last week
C2DH / open-tei-transviewer
View on GitHub
TEI Transviewer is an interface intended to the exploration of primary and secondary sources, at the document level, in historical or oth…
☆14Jul 17, 2021Updated 4 years ago
papercodekl / MolecularGET
View on GitHub
☆16Jan 1, 2020Updated 6 years ago
zhanghailiangcsu / MSBERT
View on GitHub
Improve the accuracy of database search by using BERT to embed MS/MS reasonably
☆20Oct 15, 2024Updated last year
davenquinn / compare-documents
View on GitHub
Run the Microsoft Word "Compare" tool from a CLI
☆11Sep 6, 2018Updated 7 years ago
humsha / USCorpus
View on GitHub
Urdu Summary Corpus and Software Tools Version 1.0
☆13Oct 16, 2022Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
storybookjs / testing-angular
View on GitHub
☆16Nov 26, 2024Updated last year
moj-analytical-services / our-coding-standards
View on GitHub
DASD's coding principles for analytical projects
☆16Oct 9, 2023Updated 2 years ago
dotAadarsh / YouTXT
View on GitHub
App that convert any YouTube video to text. Created for Learn Build Teach Hackathon 2022
☆13Feb 6, 2026Updated 2 months ago
voanna / slices-to-3d-brain-vae
View on GitHub
Code accompanying "Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE"
☆18Nov 26, 2020Updated 5 years ago
jddunn / dementia-progression-analysis
View on GitHub
Alzheimer's / dementia progression classifier for MRIs using CNNs and transfer learning
☆18Jan 22, 2018Updated 8 years ago
chen-bowen / Research_Documents_Curation_with_NLP
View on GitHub
Applied Finance Project from UCLA Anderson, using natural language processing techniques to classify and summarize quantitative finance r…
☆18Dec 24, 2018Updated 7 years ago
Edirom / WeGA-WebApp
View on GitHub
Web application that powers weber-gesamtausgabe.de
☆24Updated this week
ediarum / ediarum.DB
View on GitHub
eXistdb App for ediarum.BASE.edit and ediarum.REGISTER.edit
☆14Mar 1, 2024Updated 2 years ago
pranath / predict_alzheimers
View on GitHub
In this project I develop a deep learning CNN model to predict Alzheimer's disease using 3D MRI medical images of the Hippocampus region …
☆17Aug 31, 2020Updated 5 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Xray-App / playwright-junit-reporter
View on GitHub
Playwright JUnit Enhanced XML reporter
☆15Apr 1, 2026Updated last month
oxygenxml / TEI-Facsimile-Plugin
View on GitHub
A plugin that provides support for working with Digital Facsimiles in Text Encoding Initiative (TEI) vocabulary. The plugin contribute…
☆25Jun 16, 2025Updated 10 months ago
josiahakinloye / store-crawler-google-places
View on GitHub
Scrape information about places from Google Maps. Gives you extra information that you can't get using the Google Places API.
☆16Nov 11, 2022Updated 3 years ago
coleygroup / desp
View on GitHub
Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search (NeurIPS 2024)
☆30Jan 23, 2025Updated last year
storybookjs / web
View on GitHub
Storybook documentation site
☆21Updated this week
NewsEye / NLP-Notebooks-Newspaper-Collections
View on GitHub
A collection of notebooks for Natural Language Processing
☆25Jan 13, 2025Updated last year
LifeSG / react-design-system
View on GitHub
The repository for our design system in React
☆19Updated this week
dchannah / fraudhacker
View on GitHub
Anomaly detection system for medical insurance claims data
☆18Nov 7, 2017Updated 8 years ago
ml4ai / automates
View on GitHub
AutoMATES: Automated Model Assembly from Text, Equations, and Software
☆25Sep 18, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
umer7 / Data-Science-in-Python
View on GitHub
Resources to help you get started with Data Science
☆19Oct 1, 2018Updated 7 years ago
mgorkove / pdfToTxt
View on GitHub
Command line interface to convert multiple PDFs to text files. Uses pdfminer.
☆13Nov 22, 2018Updated 7 years ago
tydoc / tydoc
View on GitHub
The TypeScript documenter that meets you where you are
☆28May 11, 2021Updated 4 years ago
jrr / localstack-example
View on GitHub
☆10Feb 3, 2020Updated 6 years ago
ropensci-archive / bnf
View on GitHub
ARCHIVED Generate Code from BNF Grammars
☆12May 10, 2022Updated 3 years ago
tkomde / dash-and-jupyter-notebook-with-gitpod
View on GitHub
Simple sample to develop dash on gitpod
☆15Jun 14, 2019Updated 6 years ago
wangxr0526 / RetroPrime
View on GitHub
Code for Single-step Retrosynthesis model Retroprime
☆41Apr 27, 2021Updated 5 years ago