nreimers / se-benchmark

☆9

Alternatives and similar repositories for se-benchmark

Users that are interested in se-benchmark are comparing it to the libraries listed below

Sorting:

infinitylogesh / mutate
A library to synthesize text datasets using Large Language Models (LLM)
☆152Updated 2 years ago
nreimers / flax-sentence-embeddings
Shared code for training sentence embeddings with Flax / JAX
☆27Updated 3 years ago
timoschick / dino
This repository contains the code for "Generating Datasets with Pretrained Language Models".
☆188Updated 3 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
MarkusSagen / Master-Thesis-Multilingual-Longformer
Master thesis with code investigating methods for incorporating long-context reasoning in low-resource languages, without the need to pre…
☆33Updated 3 years ago
amazon-science / mintaka
Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)
☆113Updated 2 years ago
cardiffnlp / timelms
TimeLMs: Diachronic Language Models from Twitter
☆107Updated last year
smallbenchnlp / ELECTRA-DeBERTa
☆16Updated 2 years ago
huggingface / that_is_good_data
☆65Updated last year
argilla-io / adept-augmentations
A Python library aimed at dissecting and augmenting NER training data.
☆58Updated 2 years ago
google-research / t5x_retrieval
☆97Updated 2 years ago
castorini / mr.tydi
Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.
☆75Updated 3 years ago
yang-zhang / lightning-language-modeling
Language Modeling Example with Transformers and PyTorch Lighting
☆65Updated 4 years ago
chatdesk / grouphug
Multi-task modelling extensions for huggingface transformers
☆20Updated 2 years ago
machamp-nlp / machamp
Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/
☆87Updated 2 weeks ago
bigscience-workshop / lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
☆103Updated 2 years ago
georgian-io / Transformers-Domain-Adaptation
[DEPRECATED] Adapt Transformer-based language models to new text domains
☆87Updated last year
jwieting / paraphrastic-representations-at-scale
☆76Updated 3 years ago
bigscience-workshop / metadata
Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
☆31Updated last year
google-research-datasets / xsum_hallucination_annotations
Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h…
☆82Updated 4 years ago
esdurmus / Wikilingua
Multilingual abstractive summarization dataset extracted from WikiHow.
☆91Updated 2 months ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆177Updated last year
awebson / prompt_semantics
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”
☆85Updated 3 years ago
kawine / contextual
How Contextual are Contextualized Word Representations?
☆41Updated 5 years ago
IBM / low-resource-text-classification-framework
Research framework for low resource text classification that allows the user to experiment with classification models and active learning…
☆102Updated 3 years ago
microsoft / xtreme-distil-transformers
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale
☆154Updated last year
thevasudevgupta / bigbird
Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers
☆49Updated 2 years ago
tingofurro / keep_it_simple
Codebase, data and models for the Keep it Simple paper at ACL2021
☆39Updated last year
google-research / longt5
☆182Updated last year
kwang2049 / easy-elasticsearch
Using business-level retrieval system (BM25) with Python in just a few lines.
☆31Updated 2 years ago