chris-lovejoy / youtube-titles-and-transcriptsLinks
Dataset to train NLP model predicting YouTube title based on video content
☆19Updated 7 months ago
Alternatives and similar repositories for youtube-titles-and-transcripts
Users that are interested in youtube-titles-and-transcripts are comparing it to the libraries listed below
Sorting:
- Open source repro of "Towards Monosemanticity"☆31Updated last year
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆102Updated 8 months ago
- ☆102Updated 2 months ago
- Streamlit Annotation Tools is a Streamlit component that gives you access to various annotation tools (labeling, highlighting, etc.) for …☆99Updated last year
- ☆58Updated 2 years ago
- The official Python Library for the Groq API☆565Updated last week
- TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/und…☆367Updated 8 months ago
- Semantic search engine indexing 110 million academic publications☆93Updated last week
- Explore the use of DSPy for extracting features from PDFs 🔎☆49Updated last year
- Converting PDF files to text, mainly with a focus on arXiv papers.☆23Updated last year
- Dataset for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.☆95Updated 5 years ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆116Updated 4 months ago
- 📝 Automatically annotate papers using LLMs☆391Updated 2 weeks ago
- Fact checking baseline combining dense retrieval and textual entailment☆30Updated 4 months ago
- A code repository that cointains all the code for finetuning some of the popular LLMs on medical data☆68Updated last year
- Topic modeling helpers using managed language models from Cohere. Name text clusters using large GPT models.☆222Updated 3 years ago
- Python Library for Accessing the Cohere API☆375Updated 2 weeks ago
- Anthropic Claude2 Hackathon:Building MCTS with Claude for optimal action prediction during patient/doctor interactions.☆106Updated 2 years ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆334Updated last year
- A dataset for pretraining language models targeted for legal tasks.☆140Updated 3 years ago
- A comprehensive repository of reasoning tasks for Medical LLMs (and beyond)☆131Updated last year
- ☆27Updated last year
- ☆100Updated last year
- Get answers to research questions from 200M+ papers. Link to demo -☆207Updated last month
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆252Updated 10 months ago
- ☆64Updated last year
- Approximation of the Claude 3 tokenizer by inspecting generation stream☆148Updated last year
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆194Updated 6 months ago
- 📚 Datasets and models for instruction-tuning☆238Updated 2 years ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆119Updated this week