YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training
☆46Sep 22, 2020Updated 5 years ago
Alternatives and similar repositories for youtube_subtitle_dataset
Users that are interested in youtube_subtitle_dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆16Jun 3, 2023Updated 3 years ago
- ☆17Dec 11, 2024Updated last year
- Building the laion5B paper☆36May 6, 2022Updated 4 years ago
- ☆78Dec 7, 2023Updated 2 years ago
- Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.…☆11May 16, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Code for "Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media: A Unified Model"☆18Feb 14, 2022Updated 4 years ago
- KB data lab☆10Dec 8, 2020Updated 5 years ago
- A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.☆86Jul 24, 2023Updated 2 years ago
- ☆165Mar 5, 2021Updated 5 years ago
- [ACL 2024] An easily extensible framework for simultaneous, text-to-text neural machine translation (SimulMT) for LLMs.☆18Apr 21, 2025Updated last year
- Xayn AI☆18May 9, 2022Updated 4 years ago
- Downloads 2020 English Wikipedia articles as plaintext☆27Mar 25, 2023Updated 3 years ago
- Finetuning InstructLLaMA on consumer hardware (copy from https://github.com/tloen/alpaca-lora)☆11Mar 17, 2023Updated 3 years ago
- A JSON dataset of information about language museums around the world☆13Feb 26, 2020Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆27May 11, 2023Updated 3 years ago
- ☆1,662Apr 27, 2023Updated 3 years ago
- Web archiving utility library☆11May 5, 2026Updated last month
- Tools for training pytorch language models☆27Nov 14, 2020Updated 5 years ago
- ☆13Jan 20, 2023Updated 3 years ago
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- Standalone Wireless keystroke injection attack platform for ESP32 s2/s3☆15Jun 14, 2024Updated 2 years ago
- ☆20Apr 23, 2025Updated last year
- Lightweight tool to identify Data Contamination in LLMs evaluation☆52Mar 8, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆110Feb 17, 2023Updated 3 years ago
- Stanford CoreNLP Extensions: Fork to provide the ability to capture Multi-Word Expressions☆10Jun 14, 2022Updated 4 years ago
- Ollama chat using Google Mesop library☆20Jun 25, 2024Updated last year
- ☆11Nov 28, 2015Updated 10 years ago
- 🚀 Python asyncio actor library☆14Aug 16, 2017Updated 8 years ago
- ☆19Jun 21, 2024Updated last year
- Unofficial implementation of Adaptive Input in PyTorch☆12Feb 22, 2019Updated 7 years ago
- Multimodal data loader compatible with pytorch and tensorflow☆12Aug 14, 2024Updated last year
- ChatGPT Participates in a Computer Science Exam (2023)☆31Mar 21, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆49Oct 28, 2025Updated 7 months ago
- A small utility for converting Stanford GloVe vectors to HDF5 / NumPy☆12Apr 4, 2017Updated 9 years ago
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- ☆26Jul 11, 2022Updated 3 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- ☆11Aug 21, 2019Updated 6 years ago
- Adaptation of gxemul to support the CHERI MIPS unit test suite and certain CHERI features☆16Dec 8, 2015Updated 10 years ago