saran9991 / llm-data-annotationLinks
Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improvement, and integrates CleanLab (Confident Learning) to ensure high-quality datasets and better model performance
☆38Updated last year
Alternatives and similar repositories for llm-data-annotation
Users that are interested in llm-data-annotation are comparing it to the libraries listed below
Sorting:
- ☆367Updated last year
- A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning☆155Updated last year
- ☆78Updated 11 months ago
- Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder …☆162Updated 2 months ago
- Benchmarking Large Language Models☆99Updated 2 months ago
- A Survey of Attributions for Large Language Models☆211Updated last year
- Text classification with Foundation Language Model LLaMA☆114Updated 2 years ago
- ☆91Updated 6 months ago
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆216Updated last month
- A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approa…☆96Updated 3 years ago
- ☆41Updated 2 years ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆223Updated 9 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆137Updated last year
- A multi-purpose toolkit for table-to-text generation: web interface, Python bindings, CLI commands.☆56Updated last year
- Long Document Summarization Papers☆149Updated 2 years ago
- Multilingual/multidomain question generation datasets, models, and python library for question generation.☆360Updated last year
- Collection of NLP model explanations and accompanying analysis tools☆144Updated 2 years ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆132Updated last year
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆196Updated 8 months ago
- Efficient Attention for Long Sequence Processing☆98Updated last year
- Repository for research in the field of Responsible NLP at Meta.☆202Updated 3 months ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆136Updated 8 months ago
- NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities☆25Updated 3 months ago
- ☆37Updated 10 months ago
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆91Updated 2 years ago
- BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them t…☆144Updated last year
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆330Updated last year
- Guideline following Large Language Model for Information Extraction☆392Updated 10 months ago
- BARTScore: Evaluating Generated Text as Text Generation☆357Updated 3 years ago
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆85Updated last year