fhoffa / analyzing_githubLinks
Analyzing GitHub with BigQuery and other tools
☆194Updated 5 years ago
Alternatives and similar repositories for analyzing_github
Users that are interested in analyzing_github are comparing it to the libraries listed below
Sorting:
- The GHtorrent project website☆156Updated last year
- Advanced similarity and duplicate source code at scale.☆55Updated 6 years ago
- Scripts to mirror Github in a cloudy fashion☆567Updated last year
- A scraper focused on organizational Github accounts and their members.☆42Updated 3 years ago
- Train a model, and detect gibberish strings with it.☆64Updated 3 years ago
- A Singer tap for extracting data from the GitHub API☆74Updated 2 weeks ago
- source{d} datasets ("big code") for source code analysis and machine learning on source code☆332Updated 5 years ago
- Assessment of the pull based development model, as implemented by Github☆72Updated 6 years ago
- Perspectives on Data Science for Software Engineering☆61Updated 2 years ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Updated 3 years ago
- Crawl GitHub APIs and store the discovered orgs, repos, commits, ...☆387Updated 4 years ago
- Send Sir Perceval on a quest to retrieve and gather data from software repositories.☆303Updated last month
- Code and data belonging to our CSCW 2019 paper: "Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites".☆131Updated 6 years ago
- BigQuery import and processing pipelines☆68Updated last week
- The code processes URLs in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private…☆23Updated 3 years ago
- A Github API client to extract events and actions, and load into a database☆28Updated 3 years ago
- Predict code bug risk with git metadata☆42Updated 5 years ago
- Quickly compare changes made to Jupyter notebooks in GitHub repositories with jupydiff!☆13Updated 2 years ago
- An analysis of all 1.3 million public Jupyter Notebooks on Github in July 2017☆73Updated 7 years ago
- Project OCEAN is an open science collaboration focused on understanding the open source ecosystems creating datasets that enable research…☆54Updated 4 months ago
- Clean personally identifiable information from dirty dirty text.☆413Updated last year
- ARCHIVED, replaced by https://github.com/pypa/linehaul-cloud-function/☆70Updated 3 years ago
- plait.py - a fake data modeler☆436Updated 6 years ago
- ☆13Updated 2 years ago
- Tracking the history of the FARA data from https://www.justice.gov/nsd-fara☆15Updated 2 years ago
- Common Crawl Index Server☆70Updated 5 months ago
- Neural bag of words code search implementation using PyTorch and data from the CodeSearchNet project.☆71Updated 2 years ago
- Calculate the score of a repository based on best engineering practices.☆111Updated 4 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Utilities used by the Deep Program Understanding team☆102Updated 2 years ago