ian-nai/PDF-Scraper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ian-nai/PDF-Scraper)

ian-nai / PDF-Scraper

Python scripts to extract text from PDFs, save it as a text file, export a list of words and their frequencies to a CSV file for further analysis, extract dates from the text, and graph the text's parts of speech.

☆35

Alternatives and similar repositories for PDF-Scraper

Users that are interested in PDF-Scraper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ian-nai / PyGallica
View on GitHub
A Python wrapper for the National Library of France's Gallica API.
☆22Apr 10, 2023Updated 3 years ago
cowhite / django-members-roles
View on GitHub
A Django app with invitation system for members of a group(like Organization), roles for the members and permissions for pages based on r…
☆18Dec 8, 2022Updated 3 years ago
mayankpruthii / SocialMediaDataAutomation
View on GitHub
To automate the data fetching from various social media platforms like facebook, twitter and instagram and put them in an excel sheet and…
☆16Dec 8, 2022Updated 3 years ago
prabhakar2020 / aws_lambda_function
View on GitHub
AWS lambda function for S3 delete and copy data from source S3 to another target S3
☆16Oct 16, 2019Updated 6 years ago
krishnaik06 / IRIS
View on GitHub
☆13Nov 24, 2019Updated 6 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Sunil02324 / MIT-OpenCourseWare-Scraper
View on GitHub
Python Script to scrape through MIT OpenCourseWare website to download Course Materials.
☆13Apr 6, 2017Updated 9 years ago
danielsan / Spark-Streaming-Examples
View on GitHub
Spark Streaming examples using python
☆15Dec 17, 2015Updated 10 years ago
enexqnt / RBAA
View on GitHub
Regime Based Asset Allocation with MPT, Random Forest and Bayesian Inference
☆25Oct 16, 2022Updated 3 years ago
paveyry / rust-music
View on GitHub
Rust library for music composition with MIDI export
☆15Mar 23, 2025Updated last year
quandl / quandl-google-spreadsheet-add-on
View on GitHub
Google spreadsheet add-on for Quandl
☆44Jun 5, 2015Updated 11 years ago
treyhunner / undataclass
View on GitHub
Turn dataclasses into not-dataclasses
☆23Jun 18, 2022Updated 4 years ago
michaelmarty / AnalChem
View on GitHub
Python notebooks for teaching analytical chemistry
☆15May 29, 2026Updated last month
xxao / msread
View on GitHub
Read popular mass spectrometry formats
☆10Sep 9, 2025Updated 10 months ago
rpeckner-broad / Specter
View on GitHub
Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics
☆18Oct 11, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Gressling / examples
View on GitHub
Examples for the book 'Data Science in Chemistry', ISBN: 978-3-11-062939-2, Published: 23 Nov 2020
☆18Dec 26, 2021Updated 4 years ago
xin0501 / xin_bot
View on GitHub
基于wechaty开发的微信机器人
☆11May 1, 2022Updated 4 years ago
Saurabh1999 / Tic-Tac-Toe
View on GitHub
Unbeatable Tic Tac Toe game in java with gui
☆18Jun 27, 2017Updated 9 years ago
SandyUndefined / Django-Search-Engine
View on GitHub
Django Search Engine, Scrape links and text from different search engine as mentioned in Readme and Display it.
☆12Nov 28, 2025Updated 7 months ago
Chris7 / pyquant
View on GitHub
Platform independent command line tool for analysis of mass spectrometry data.
☆15Dec 26, 2022Updated 3 years ago
Alexis97 / GPT_Reading_Assistant
View on GitHub
GPT-Book Reader is a powerful app that enables reading, summarizing, and translating long texts using the cutting-edge GPT technology. It…
☆10Aug 5, 2023Updated 2 years ago
ian-nai / In-Browser-OCR
View on GitHub
A web app for performing OCR on images within your browser.
☆57Dec 24, 2023Updated 2 years ago
anoopkunchukuttan / multinmt_tutorial_coling2020
View on GitHub
Material for the COLING 2020 Tutorial on Multilingual NMT
☆16Dec 10, 2020Updated 5 years ago
domoncassiu / web-scraping
View on GitHub
收集数据并预测股市行情的整合式自动化爬虫工具 An integrated automated web scraping tool to collect data and predict stock market trends
☆11Aug 31, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
omaciel / Web-UI-Automation-with-Selenium-for-Beginners
View on GitHub
Quick introduction to using Selenium with Python for Web UI automation
☆17Nov 6, 2017Updated 8 years ago
rformassspectrometry / RforMassSpectrometry
View on GitHub
The R for Mass Spectrometry meta-package
☆21May 11, 2026Updated 2 months ago
europeana / REPOX
View on GitHub
Data Aggregation and Interoperability Manager
☆20Mar 26, 2022Updated 4 years ago
edsu / bagweb
View on GitHub
mirror a website, put it in a bag
☆24Dec 18, 2022Updated 3 years ago
ALIADA / aliada-tool
View on GitHub
Aliada tool implementation
☆37Mar 31, 2017Updated 9 years ago
Quinten / phaser3-maze-demo
View on GitHub
procedural generated maze in Phaser 3
☆17Jan 7, 2023Updated 3 years ago
mjhelf / Metaboseek
View on GitHub
Interactive software to analyze and browse mass spectrometry data
☆21Jul 3, 2025Updated last year
bc-abe / Spacemacs
View on GitHub
Where I keep my config files for other to look at and use
☆12May 27, 2021Updated 5 years ago
bovee / Aston
View on GitHub
View and interpret UV-visible and mass spectrometry chromatographic data.
☆21Mar 12, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
krishnaik06 / Transformers-Materials
View on GitHub
☆54Oct 19, 2024Updated last year
bhattbhavesh91 / language-translation-using-mBart-50
View on GitHub
Streamlit app to Translate text to or between 50 languages with mBART-50 from Huggingface and Facebook
☆25May 29, 2021Updated 5 years ago
imatix-legacy / libero
View on GitHub
Libero: Template driven Finite State Machine (FSM) code generator (developed 1991 to approx 2000)
☆31May 4, 2016Updated 10 years ago
zejn / pypdf2xml
View on GitHub
Convert text from PDF to XML.
☆45Oct 5, 2018Updated 7 years ago
jezcope / pyrefine
View on GitHub
Execute OpenRefine JSON scripts without OpenRefine (or Java)
☆32Dec 27, 2022Updated 3 years ago
muneebalam / Hockey
View on GitHub
Scraping and analysis of data from NHL and other leagues
☆24Oct 28, 2018Updated 7 years ago
mobiusklein / ms_peak_picker
View on GitHub
A small library to provide peak picking for software processing mass spectrometry data
☆24Apr 17, 2026Updated 3 months ago