suminb / winnowing
A Python implementation of the Winnowing (local algorithms for document fingerprinting)
☆53Updated 5 years ago
Alternatives and similar repositories for winnowing:
Users that are interested in winnowing are comparing it to the libraries listed below
- statistical similarity of binaries (Esh)☆73Updated 8 years ago
- ☆20Updated 7 years ago
- Website for Learning from "Big Code"☆29Updated 3 years ago
- ANTLR 3 fuzzy parser☆48Updated 12 years ago
- Artifacts and other data for "Code Vectors: Understanding Programs Through Embedded Abstraced Symbolic Traces"☆22Updated 4 years ago
- MLonCode community effort to implement Learning Distributed Representations of Code (https://arxiv.org/pdf/1803.09473.pdf)☆39Updated 6 years ago
- bitshred research project code.☆14Updated 8 years ago
- ☆9Updated 10 years ago
- the code for three models introduced in DYNAMIC NEURAL PROGRAM EMBEDDINGS FOR PROGRAM REPAIR (ICLR 18)☆32Updated 6 years ago
- Programmer De-anonymization via Code Stylometry☆74Updated 7 years ago
- Evaluation of source authorship attribution tool☆23Updated 3 years ago
- Software vulnerabilities data set☆24Updated 4 years ago
- Babelfish Python client☆16Updated 5 years ago
- A toolkit for pre-processing large source code corpora☆46Updated 2 years ago
- DataTracker: A Pin tool for collecting high-fidelity data provenance from unmodified programs.☆91Updated 6 years ago
- A Tool for Embedding Strings in Vector Spaces☆58Updated 5 years ago
- Tool for analyzing git log messages and diffs.☆22Updated 4 years ago
- Using Machine Learning to predict the outcome of a zzuf fuzzing campaign☆24Updated 9 years ago
- ☆143Updated last year
- Neural Variable Renaming for Decompiled Binaries☆44Updated 4 years ago
- The Z3 Theorem Prover - repository for staging python distributions☆56Updated 5 years ago
- ☆58Updated 9 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆59Updated 3 years ago
- UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages☆42Updated 3 years ago
- Detect common mistakes in academic papers☆58Updated 5 years ago
- Public release items for the DARPA Space/Time Analysis for Cybersecurity (STAC) program☆26Updated 6 years ago
- Contains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlie…☆83Updated last year
- Tool for algorithmic complexity analysis based on symbolic execution☆10Updated 6 years ago
- Identifying Open-Source License Violation and 1-day Security Risk at Large Scale☆66Updated 7 years ago
- An inter-procedural data-flow analysis framework using value-based context sensitivity☆89Updated 8 months ago