chris-lovejoy / youtube-titles-and-transcripts
Dataset to train NLP model predicting YouTube title based on video content
β17Updated last year
Alternatives and similar repositories for youtube-titles-and-transcripts:
Users that are interested in youtube-titles-and-transcripts are comparing it to the libraries listed below
- Explore the use of DSPy for extracting features from PDFs πβ39Updated last year
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- Semantic search engine indexing 110 million academic publicationsβ80Updated 3 weeks ago
- π€ Disaggregators: Curated data labelers for in-depth analysis.β65Updated 2 years ago
- Prompt Engineering for Large Language Models - Notebooks, Demos, Exercises, and Projectsβ22Updated last year
- π Fine-tune OpenAI models for text classification, question answering, and moreβ16Updated last year
- β91Updated 10 months ago
- π’ Work with static vector modelsβ23Updated 2 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. Iβ¦β88Updated this week
- Sentence tokenizer for clinical/medical text.β27Updated 9 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROβ¦β49Updated 2 weeks ago
- A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector databaseβ54Updated 8 months ago
- π Template Haystack Search Application with Streamlitβ27Updated 2 months ago
- A personal knowledge base that I can dump information to and help me learnβ24Updated 9 months ago
- β31Updated last year
- Use sync mode Playwright interactively, inside a Jupyter notebookβ14Updated 3 months ago
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ100Updated last year
- Let's finetune BLOOM-3B on Pile of Law - r/legal_adviceβ36Updated 2 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.β78Updated last year
- Github repo for storing LlamaDatasetsβ33Updated last year
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β51Updated 2 weeks ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β67Updated 4 months ago
- β59Updated last year
- Web application that allows you to interact with biomedical knowledge graphs and query biomedical questions.β30Updated last year
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, β¦β80Updated 4 months ago
- Pinecone text client libraryβ59Updated 3 weeks ago
- β11Updated last year
- πΊοΈ Data Cleaning and Textual Data Visualization πΊοΈβ167Updated 9 months ago
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.β16Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ230Updated 5 months ago