Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
☆31Jun 12, 2023Updated 2 years ago
Alternatives and similar repositories for metadata
Users that are interested in metadata are comparing it to the libraries listed below
Sorting:
- Tutorial on Transformers 🤖, HuggingFace 🤗 and Social Science Applications 👥 @ IC2S2☆17Aug 8, 2021Updated 4 years ago
- ☆65Aug 7, 2023Updated 2 years ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆74May 15, 2024Updated last year
- Basic Memory library for Haystack NLP agents☆22Dec 28, 2024Updated last year
- [EMNLP 2022] Adapting a Language Model While Preserving its General Knowledge☆21Feb 12, 2023Updated 3 years ago
- 2018 Computational Text Analysis Notebooks, University of Mannheim☆13Nov 22, 2018Updated 7 years ago
- A feature-rich concurrency kit, yet another DAG framework☆10Jan 18, 2026Updated last month
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- Paster core module using KiteX☆10Aug 30, 2023Updated 2 years ago
- ☆11Feb 28, 2022Updated 3 years ago
- ☆18Jun 25, 2025Updated 8 months ago
- A job management system for python☆10Jan 16, 2026Updated last month
- Conversion of audio files to text using whisper from OpenAI with a simple tkinter GUI☆10Apr 13, 2023Updated 2 years ago
- LLM Building Blocks for Python Course☆15Nov 17, 2025Updated 3 months ago
- ☆11Feb 26, 2024Updated 2 years ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Apr 26, 2023Updated 2 years ago
- Repository of GitHub Actions used by the Oh My Zsh project☆22Sep 4, 2020Updated 5 years ago
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- Docker base images for C++ development using vcpkg☆10Jan 27, 2026Updated last month
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated 9 months ago
- SSL Video Representation Learning project☆14Jul 8, 2025Updated 7 months ago
- Radix Primitives Cheatsheet☆12Mar 11, 2022Updated 3 years ago
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 6 months ago
- Keyphase Extraction Package☆11Aug 24, 2020Updated 5 years ago
- ☆12Mar 3, 2023Updated 2 years ago
- Detecting Concreteness in Natural Language☆15Jan 25, 2024Updated 2 years ago
- I'm trying to learn calculus before taking a calculus course☆24Dec 11, 2024Updated last year
- ☆10Jun 12, 2023Updated 2 years ago
- An accessibility suite giving you control over what you read.☆14Dec 10, 2022Updated 3 years ago
- An automated phishing tool with 30+ templates. This Tool is made for educational purpose only ! Author will not be responsible for any mi…☆10Oct 1, 2022Updated 3 years ago
- An alternative to elasticsearch engine written in Go for small set of documents that uses inverted index to build the index and utilizes …☆15Jun 14, 2020Updated 5 years ago
- Stuff related to scraping the Code Review StackExchange☆12Jan 19, 2023Updated 3 years ago
- ☆11Nov 27, 2022Updated 3 years ago
- ☆11Nov 16, 2022Updated 3 years ago
- [TACL 2024] Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis☆11Nov 14, 2024Updated last year
- Accurate counters with Kafka & RocksDB.☆15Jan 22, 2021Updated 5 years ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- Natural Perturbation for Robust Question Answering☆12Apr 7, 2020Updated 5 years ago
- A/B Test knowledge system(AB实验知识体系).☆12Sep 24, 2020Updated 5 years ago