leonardr / cce-python
Python tools for processing data from the Catalog of Copyright Entries
☆37Updated 5 years ago
Alternatives and similar repositories for cce-python
Users that are interested in cce-python are comparing it to the libraries listed below
Sorting:
- NYPL Project to transcribe and parse pages from the US Catalog of Copyright Entries☆58Updated 2 years ago
- Tab-delimited versions of Catalog of Copyright Entries renewals☆28Updated 6 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated 3 weeks ago
- recursively deduplicate a directory and write its contents to a new directory while remembering the old paths☆48Updated 4 years ago
- track changes to the news, where news is anything with an RSS feed☆178Updated 4 years ago
- Trough: Big data, small databases.☆41Updated 9 months ago
- Pages repo☆88Updated 3 years ago
- Library of Congress coding standards☆30Updated 10 months ago
- A command-line tool for interacting with books in git☆110Updated 8 months ago
- Documentation for the GITenberg books project☆29Updated 6 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆35Updated 3 weeks ago
- Insert matching punctuation for mismatched quotation marks, parentheses, etc. Good postprocessing for N-gram text synthesis.☆15Updated 9 years ago
- export data from twitter archive and visualize it☆25Updated 2 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆97Updated 11 months ago
- National Poetry Generation Month 2017☆13Updated 8 years ago
- Grabbing all news.☆62Updated 5 years ago
- Friendly Slack bot for looking up cases☆21Updated 7 years ago
- Test cases for validating BagIt implementations☆11Updated 2 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆54Updated 3 months ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆58Updated 10 months ago
- Source code repository for Digital History Hacks☆23Updated 11 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆42Updated 9 months ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Updated 7 years ago
- My name vs Oxymnndms, kmq of km / Look on my works, ye uny, and despaw☆27Updated 7 years ago
- command line resource for working with digital primary sources☆27Updated 6 years ago
- WARC and ARC indexing and discovery tools.☆123Updated 2 months ago
- A javascript tool to visualize the diff's in wikipedia☆35Updated 2 years ago
- Simple utility to convert links in any file to permanent links via the https://archive.org/web/ or http://perma.cc☆17Updated 2 years ago