EleutherAI / github-downloaderLinks
Script for downloading GitHub.
☆95Updated 11 months ago
Alternatives and similar repositories for github-downloader
Users that are interested in github-downloader are comparing it to the libraries listed below
Sorting:
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆80Updated last year
- Repository for analysis and experiments in the BigCode project.☆118Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆108Updated 2 years ago
- ☆78Updated last year
- ☆30Updated last year
- ☆75Updated 2 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆209Updated last year
- Code for "StructCoder: Structure-Aware Transformer for Code Generation"☆74Updated last year
- Code for the paper "A Structural Model for Contextual Code Changes"☆32Updated last year
- ☆52Updated 3 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆48Updated last year
- ☆97Updated 2 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.☆180Updated 2 years ago
- Two Automatic code completion IDE extensions for @JetBrains and @microsoft/vscode based on Transformer-based large language models for so…☆55Updated last year
- ☆123Updated 2 years ago
- Techniques used to run BLOOM at inference in parallel☆37Updated 2 years ago
- [TMLR'23] Contrastive Search Is What You Need For Neural Text Generation☆119Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆103Updated 2 years ago
- Experiments with generating opensource language model assistants☆97Updated 2 years ago
- Developing tools to automatically analyze datasets☆73Updated 7 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆121Updated last year
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆82Updated 8 months ago
- Semantic Code Search☆35Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 4 months ago
- Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023☆246Updated last year
- Evaluation suite for large-scale language models.☆125Updated 3 years ago
- ☆28Updated 3 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- ☆33Updated 3 months ago