sbarman / webscriptLinks
A record and replay system for the browser (renamed Ringer)
☆30Updated 8 years ago
Alternatives and similar repositories for webscript
Users that are interested in webscript are comparing it to the libraries listed below
Sorting:
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 8 years ago
- A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row o…☆251Updated last year
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Tools to construct and process Common Crawl webgraphs☆98Updated last week
- Search for similar short strings☆53Updated 5 years ago
- Script to calculate the normalized compression distance of sets of files. It also tries to parallize the work over the available processo…☆18Updated 10 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- A machine learning software for extracting information from scholarly documents☆23Updated 4 years ago
- Mad (╯°□°)╯'ing☆10Updated 2 years ago
- Implementation of Microsoft Vips algorithm in Python☆18Updated 6 years ago
- Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings☆77Updated 3 years ago
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- Index Common Crawl archives in tabular format☆122Updated 2 months ago
- Implementation of the Cypher language for searching NetworkX graphs☆120Updated 2 weeks ago
- ☆11Updated 6 years ago
- tool for collectively summarizing large discussions☆145Updated 2 years ago
- Assessing Source Code Semantic Similarity with Unsupervised Learning☆41Updated 7 years ago
- Credible Web CG Admin/General☆25Updated 3 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 5 years ago
- a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)☆38Updated 6 years ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆201Updated 7 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 3 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 4 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 13 years ago
- Search COVID-19 Open Research Dataset (CORD-19) using Vespa - the open source big data serving engine.☆38Updated 2 weeks ago
- CoreNLG is an easy to use and productivity oriented Python library for Natural Language Generation. It aims to provide the essential tool…☆27Updated 4 years ago
- Run information flow experiments on the Web☆39Updated 4 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆184Updated last week
- Read natural language interactive queries. Great for bots.☆18Updated 8 years ago
- A extensible conversational agent for data science tasks☆123Updated 7 years ago