☆32May 23, 2023Updated 2 years ago
Alternatives and similar repositories for commoncrawl_downloader
Users that are interested in commoncrawl_downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Dec 11, 2024Updated last year
- Downloads 2020 English Wikipedia articles as plaintext☆27Mar 25, 2023Updated 3 years ago
- ☆16Mar 25, 2022Updated 4 years ago
- Source code to "SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks"☆10Dec 17, 2023Updated 2 years ago
- my configuration files☆14Nov 16, 2025Updated 4 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Script for downloading GitHub.☆13Sep 24, 2020Updated 5 years ago
- ☆11Dec 3, 2020Updated 5 years ago
- OpenAI Codex for Sublime Text☆11Sep 25, 2021Updated 4 years ago
- Python Research Framework☆107Nov 3, 2022Updated 3 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- StyleGAN2 - Official TensorFlow Implementation☆12Jul 15, 2020Updated 5 years ago
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- The OpenLH is a Liquid handling system based on an available robotic arm platform (uARM swift Pro) which allows for creative exploration …☆22Jun 20, 2024Updated last year
- Here are all of the PowerPoint presentations that I have ever created and presented.☆12Dec 28, 2020Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A simple, minimalist writing theme for Typora☆15Jan 20, 2026Updated 2 months ago
- Sentiment Analysis using BERT model and Tensorflowjs☆13Jun 2, 2020Updated 5 years ago
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated 11 months ago
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- This repo contains the code for Late Prompt Tuning.☆12Dec 22, 2025Updated 3 months ago
- Use Python to Automate the PowerPoint Update☆15May 28, 2023Updated 2 years ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- Remove generated stories with stray unicode characters☆12Jan 3, 2024Updated 2 years ago
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆13Feb 5, 2022Updated 4 years ago
- subdomain list based on Common Crawl data, sorted by popularity☆17Nov 19, 2019Updated 6 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆76Jan 14, 2021Updated 5 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- [EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation☆14Aug 20, 2025Updated 7 months ago
- Extract images from PowerPoint files☆17Dec 1, 2011Updated 14 years ago
- Automated generation of powerpoint slides for fun and profit☆13Oct 18, 2017Updated 8 years ago
- ☆19Mar 23, 2025Updated last year
- Script for downloading GitHub.☆98Jul 1, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- Web archiving utility library☆11Mar 11, 2026Updated 2 weeks ago
- A robust web archive analytics toolkit☆135Oct 15, 2025Updated 5 months ago
- ## Step 1 - Scraping Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter. * Create a Ju…☆11Dec 22, 2021Updated 4 years ago
- Step-by-step guide on deploying a natural language processing machine learning model to the Azure platform and consuming it using Power A…☆15Jan 30, 2024Updated 2 years ago
- Eden Flux LoRA trainer and full-finetuning☆23Mar 21, 2025Updated last year