☆33May 23, 2023Updated 2 years ago
Alternatives and similar repositories for commoncrawl_downloader
Users that are interested in commoncrawl_downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Downloads 2020 English Wikipedia articles as plaintext☆27Mar 25, 2023Updated 3 years ago
- ☆16Mar 25, 2022Updated 4 years ago
- Source code to "SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks"☆10Dec 17, 2023Updated 2 years ago
- Script for downloading GitHub.☆13Sep 24, 2020Updated 5 years ago
- ☆11Dec 3, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- OpenAI Codex for Sublime Text☆11Sep 25, 2021Updated 4 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- ☆27Mar 13, 2021Updated 5 years ago
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- website for MS Marco☆34Mar 26, 2025Updated last year
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Jun 2, 2019Updated 6 years ago
- A simple, minimalist writing theme for Typora☆15Jan 20, 2026Updated 2 months ago
- Sentiment Analysis using BERT model and Tensorflowjs☆13Jun 2, 2020Updated 5 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆46Sep 22, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- This repo contains the code for Late Prompt Tuning.☆12Dec 22, 2025Updated 3 months ago
- [EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…☆13Oct 20, 2022Updated 3 years ago
- A GPT-powered AI auto scraper for websites. AI Web Scraping made easy.☆14Jun 26, 2023Updated 2 years ago
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆15Jul 24, 2023Updated 2 years ago
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated last year
- ☆28Nov 28, 2024Updated last year
- Scripts to parse arxiv documents for NLP tasks☆19Jun 12, 2023Updated 2 years ago
- A PyTorch toolbox for domain adaptation, domain generalization, federated learning DA/DG, active learning DA/DG, ALDG and semi-supervised…☆11Jan 10, 2022Updated 4 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆76Jan 14, 2021Updated 5 years ago
- ☆12May 17, 2022Updated 3 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Mar 19, 2026Updated 3 weeks ago
- ☆19Mar 23, 2025Updated last year
- Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval☆26Aug 7, 2023Updated 2 years ago
- Script for downloading GitHub.☆99Jul 1, 2024Updated last year
- ☆13Jan 20, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Tools for training pytorch language models☆27Nov 14, 2020Updated 5 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- ::A tool to abbreviate scientific paper contents using ChatGPT::☆13Nov 20, 2023Updated 2 years ago
- The scripts for Orange Pi Linux SDK☆11Apr 14, 2020Updated 6 years ago
- Python package for converting xml and epubs to text files☆33Jun 9, 2020Updated 5 years ago
- Search engine of my Curius data☆16Apr 10, 2022Updated 4 years ago
- This repository contains generic information about open-source ventilator applications.☆21Jun 11, 2020Updated 5 years ago