nreimers / se-benchmark
☆9Updated 3 years ago
Alternatives and similar repositories for se-benchmark
Users that are interested in se-benchmark are comparing it to the libraries listed below
Sorting:
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆188Updated 3 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Master thesis with code investigating methods for incorporating long-context reasoning in low-resource languages, without the need to pre…☆33Updated 3 years ago
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆113Updated 2 years ago
- TimeLMs: Diachronic Language Models from Twitter☆107Updated last year
- ☆16Updated 2 years ago
- ☆65Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- ☆97Updated 2 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆75Updated 3 years ago
- Language Modeling Example with Transformers and PyTorch Lighting☆65Updated 4 years ago
- Multi-task modelling extensions for huggingface transformers☆20Updated 2 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆87Updated 2 weeks ago
- A framework for few-shot evaluation of autoregressive language models.☆103Updated 2 years ago
- [DEPRECATED] Adapt Transformer-based language models to new text domains☆87Updated last year
- ☆76Updated 3 years ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆31Updated last year
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h …☆82Updated 4 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆91Updated 2 months ago
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated last year
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 3 years ago
- How Contextual are Contextualized Word Representations?☆41Updated 5 years ago
- Research framework for low resource text classification that allows the user to experiment with classification models and active learning…☆102Updated 3 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale☆154Updated last year
- Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers☆49Updated 2 years ago
- Codebase, data and models for the Keep it Simple paper at ACL2021☆39Updated last year
- ☆182Updated last year
- Using business-level retrieval system (BM25) with Python in just a few lines.☆31Updated 2 years ago