☆33May 23, 2023Updated 3 years ago
Alternatives and similar repositories for commoncrawl_downloader
Users that are interested in commoncrawl_downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Dec 11, 2024Updated last year
- Downloads 2020 English Wikipedia articles as plaintext☆27Mar 25, 2023Updated 3 years ago
- ☆16Mar 25, 2022Updated 4 years ago
- my configuration files☆14Nov 16, 2025Updated 7 months ago
- Script for downloading GitHub.☆13Sep 24, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Dec 3, 2020Updated 5 years ago
- OpenAI Codex for Sublime Text☆11Sep 25, 2021Updated 4 years ago
- Python Research Framework☆107Nov 3, 2022Updated 3 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- StyleGAN2 - Official TensorFlow Implementation☆12Jul 15, 2020Updated 5 years ago
- ☆27Mar 13, 2021Updated 5 years ago
- website for MS Marco☆36Mar 26, 2025Updated last year
- Here are all of the PowerPoint presentations that I have ever created and presented.☆12Dec 28, 2020Updated 5 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆46Sep 22, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- A GPT-powered AI auto scraper for websites. AI Web Scraping made easy.☆14Jun 26, 2023Updated 2 years ago
- ☆1,662Apr 27, 2023Updated 3 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆15Jul 24, 2023Updated 2 years ago
- Remove generated stories with stray unicode characters☆12Jan 3, 2024Updated 2 years ago
- downloads and parses subtitle dataset from opensubtitles.org☆15Apr 19, 2024Updated 2 years ago
- Scripts to parse arxiv documents for NLP tasks☆19Jun 12, 2023Updated 3 years ago
- ☆13Feb 5, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- subdomain list based on Common Crawl data, sorted by popularity☆18Nov 19, 2019Updated 6 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆76Jan 14, 2021Updated 5 years ago
- ☆12May 17, 2022Updated 4 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- ☆12Jul 13, 2023Updated 2 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆36May 16, 2026Updated last month
- A pytorch implementation of spiral++☆11Mar 8, 2022Updated 4 years ago
- ☆19Mar 23, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- ☆13Jan 20, 2023Updated 3 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- A robust web archive analytics toolkit☆141Updated this week
- code for Preprint paper at Arxiv: MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts☆24Nov 29, 2023Updated 2 years ago
- ☆23Aug 7, 2023Updated 2 years ago