☆33May 23, 2023Updated 3 years ago
Alternatives and similar repositories for commoncrawl_downloader
Users that are interested in commoncrawl_downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Dec 11, 2024Updated last year
- Downloads 2020 English Wikipedia articles as plaintext☆27Mar 25, 2023Updated 3 years ago
- ☆16Mar 25, 2022Updated 4 years ago
- Source code to "SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks"☆10Dec 17, 2023Updated 2 years ago
- Script for downloading GitHub.☆13Sep 24, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆78Dec 7, 2023Updated 2 years ago
- ☆27Mar 13, 2021Updated 5 years ago
- Useful prompts for interacting with an AI.☆14Jul 14, 2020Updated 5 years ago
- website for MS Marco☆35Mar 26, 2025Updated last year
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Jun 2, 2019Updated 6 years ago
- A simple, minimalist writing theme for Typora☆15Jan 20, 2026Updated 4 months ago
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆46Sep 22, 2020Updated 5 years ago
- This repo contains the code for Late Prompt Tuning.☆12Dec 22, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆1,658Apr 27, 2023Updated 3 years ago
- Use Python to Automate the PowerPoint Update☆15May 28, 2023Updated 3 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- Remove generated stories with stray unicode characters☆12Jan 3, 2024Updated 2 years ago
- downloads and parses subtitle dataset from opensubtitles.org☆15Apr 19, 2024Updated 2 years ago
- Scripts to parse arxiv documents for NLP tasks☆19Jun 12, 2023Updated 2 years ago
- A PyTorch toolbox for domain adaptation, domain generalization, federated learning DA/DG, active learning DA/DG, ALDG and semi-supervised…☆11Jan 10, 2022Updated 4 years ago
- A WGAN-GP that utilizes a compositional pattern producing network as the generator☆11Sep 9, 2021Updated 4 years ago
- ☆22Dec 4, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆12May 17, 2022Updated 4 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- ☆12Jul 13, 2023Updated 2 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆36May 16, 2026Updated last week
- Extract images from PowerPoint files☆17Dec 1, 2011Updated 14 years ago
- This is a fun browser-based game based on Google's emoji kitchen API. Emoji Kitchen is a feature of Gboard which allows you to combine tw…☆13Dec 9, 2024Updated last year
- A pytorch implementation of spiral++☆11Mar 8, 2022Updated 4 years ago
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated 4 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Automated generation of powerpoint slides for fun and profit☆13Oct 18, 2017Updated 8 years ago
- ☆19Mar 23, 2025Updated last year
- Script for downloading GitHub.☆99Jul 1, 2024Updated last year
- ☆13Jan 20, 2023Updated 3 years ago
- Terraform Cloud Dynamic Credentials module as an IAM OIDC identity provider in AWS☆12May 11, 2026Updated 2 weeks ago
- ## Step 1 - Scraping Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter. * Create a Ju…☆11Dec 22, 2021Updated 4 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year