YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training
☆46Sep 22, 2020Updated 5 years ago
Alternatives and similar repositories for youtube_subtitle_dataset
Users that are interested in youtube_subtitle_dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆15Jun 3, 2023Updated 2 years ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆19Aug 28, 2023Updated 2 years ago
- Building the laion5B paper☆36May 6, 2022Updated 3 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆18May 24, 2023Updated 2 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆11Dec 3, 2020Updated 5 years ago
- Code for "Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media: A Unified Model"☆18Feb 14, 2022Updated 4 years ago
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- downloads and parses subtitle dataset from opensubtitles.org☆15Apr 19, 2024Updated 2 years ago
- A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.☆86Jul 24, 2023Updated 2 years ago
- Code and Swedish pre-trained models for BERT☆12Feb 5, 2020Updated 6 years ago
- ☆11Mar 15, 2017Updated 9 years ago
- ☆164Mar 5, 2021Updated 5 years ago
- Homemade LightGBM and VGG-net experiment setup for DCASE2017 task 1☆11Aug 8, 2017Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Finetuning InstructLLaMA on consumer hardware (copy from https://github.com/tloen/alpaca-lora)☆11Mar 17, 2023Updated 3 years ago
- A JSON dataset of information about language museums around the world☆13Feb 26, 2020Updated 6 years ago
- ☆11Feb 23, 2024Updated 2 years ago
- Server wrapper for Stanford CoreNLP☆14Nov 4, 2014Updated 11 years ago
- Harness CI migration utility☆11Mar 19, 2026Updated last month
- ☆1,642Apr 27, 2023Updated 2 years ago
- Web archiving utility library☆11Mar 11, 2026Updated last month
- ☆13Jan 20, 2023Updated 3 years ago
- Tools for training pytorch language models☆27Nov 14, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- Korean phoneme dictionary generator for training Montreal Forced Aligner (MFA)☆13Feb 27, 2021Updated 5 years ago
- Sound classification using neural networks☆12Jun 6, 2018Updated 7 years ago
- CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition☆12Oct 7, 2019Updated 6 years ago
- The Codebase UI that ships with UCM☆20Feb 24, 2026Updated last month
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Feb 17, 2023Updated 3 years ago
- Data and preprocessing scripts for SemEval 2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding☆15Feb 3, 2022Updated 4 years ago
- Stanford CoreNLP Extensions: Fork to provide the ability to capture Multi-Word Expressions☆10Jun 14, 2022Updated 3 years ago
- Mechanics functions with end-to-end support for deep learning developers, written in Ivy.☆14Aug 28, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Python package for converting xml and epubs to text files☆33Jun 9, 2020Updated 5 years ago
- Toolkit for building prompt templates for language models☆12Sep 30, 2022Updated 3 years ago
- Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.☆13Aug 26, 2020Updated 5 years ago
- Drift detection module for machine learning pipelines.☆24Jun 21, 2023Updated 2 years ago
- Examples to demonstrate use of the Selection API.☆12Mar 1, 2017Updated 9 years ago
- 🚀 Python asyncio actor library☆14Aug 16, 2017Updated 8 years ago
- This repository contains generic information about open-source ventilator applications.☆21Jun 11, 2020Updated 5 years ago