fhoffa / analyzing_githubLinks
Analyzing GitHub with BigQuery and other tools
☆198Updated 5 years ago
Alternatives and similar repositories for analyzing_github
Users that are interested in analyzing_github are comparing it to the libraries listed below
Sorting:
- The GHtorrent project website☆157Updated last year
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- Scripts to mirror Github in a cloudy fashion☆566Updated last year
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 3 years ago
- The code processes URLs in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private…☆23Updated 4 years ago
- source{d} datasets ("big code") for source code analysis and machine learning on source code☆337Updated 5 years ago
- Train a model, and detect gibberish strings with it.☆67Updated 3 years ago
- A scraper focused on organizational Github accounts and their members.☆42Updated 3 years ago
- ARCHIVED, replaced by https://github.com/pypa/linehaul-cloud-function/☆71Updated 3 years ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Updated last month
- Automatically check mismatch between code and comments using AI and ML☆54Updated 4 years ago
- ☆73Updated last week
- Assessment of the pull based development model, as implemented by Github☆74Updated 6 years ago
- Predict code bug risk with git metadata☆42Updated 5 years ago
- Code and data belonging to our CSCW 2019 paper: "Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites".☆133Updated 6 years ago
- AboutCode Toolkit provides a simple way to document provenance metadata (origin and license) about third-party code that you use in your…☆99Updated 4 months ago
- Various Jupyter notebooks about Common Crawl data☆59Updated 6 months ago
- tools for fast reading of docs☆49Updated 3 years ago
- Common Crawl Index Server☆70Updated 7 months ago
- Clean personally identifiable information from dirty dirty text.☆415Updated 2 years ago
- BroadbandNow is the most comprehensive resource for internet service provider plan, pricing and coverage data.☆29Updated 4 years ago
- A tutorial on how to do GitHub research with GHTorrent http://ghtorrent.github.io/tutorial☆21Updated last year
- A maximum-strength name parser for record linkage.☆38Updated last month
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Ontology dataset for open_numbers namespace☆10Updated 11 months ago
- The Data Linter identifies potential issues (lints) in your ML training data.☆88Updated 7 years ago
- A Singer tap for extracting data from the GitHub API☆74Updated last week
- Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Here, we wrap Blackstone with a performant API …☆60Updated 4 years ago
- Send Sir Perceval on a quest to retrieve and gather data from software repositories.☆307Updated 2 weeks ago
- Scraping Assisted by Learning☆35Updated last month