EleutherAI / github-downloader
Script for downloading GitHub.
☆90Updated 6 months ago
Alternatives and similar repositories for github-downloader:
Users that are interested in github-downloader are comparing it to the libraries listed below
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆80Updated last year
- Repository for analysis and experiments in the BigCode project.☆117Updated 9 months ago
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆105Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆45Updated last year
- ☆77Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆63Updated last year
- ☆29Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆116Updated last year
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆86Updated last year
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆70Updated 7 months ago
- ☆75Updated last year
- Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023☆236Updated last year
- ☆121Updated last year
- Code for "StructCoder: Structure-Aware Transformer for Code Generation"☆70Updated 11 months ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆76Updated 4 months ago
- ☆50Updated 3 weeks ago
- Two Automatic code completion IDE extensions for @JetBrains and @microsoft/vscode based on Transformer-based large language models for so…☆55Updated 9 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆140Updated 5 months ago
- Code for the paper "Efficient Training of Language Models to Fill in the Middle"☆170Updated last year
- ☆148Updated 3 years ago
- ☆177Updated last year
- Accepted by Transactions on Machine Learning Research (TMLR)☆122Updated 3 months ago
- A unified benchmark for math reasoning☆87Updated last year
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆179Updated 2 years ago
- Dataset and code for Findings of EMNLP'21 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".☆39Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- PROSE Public Benchmark Suite☆24Updated 3 months ago
- evol augment any dataset online☆56Updated last year
- Developing tools to automatically analyze datasets☆74Updated 2 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 5 months ago