AhmedSSoliman / MarianCG-NL-to-Code
This repository is the implementation of a Transformer model called MarianCG which is developed for the Code Generation problem.
☆21Updated 2 years ago
Alternatives and similar repositories for MarianCG-NL-to-Code:
Users that are interested in MarianCG-NL-to-Code are comparing it to the libraries listed below
- A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.☆53Updated 3 years ago
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆72Updated 8 months ago
- A repo for code based language models☆18Updated 4 years ago
- ☆42Updated 3 weeks ago
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆107Updated 2 years ago
- Prompt tuning toolkit for GPT-2 and GPT-Neo☆88Updated 3 years ago
- Training language models to make programs faster☆87Updated 10 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆68Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆46Updated last year
- A extension of Transformers library to include T5ForSequenceClassification class.☆38Updated last year
- Code for ProtAugment: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning☆21Updated 2 years ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆17Updated 7 months ago
- Code for generating the JuICe dataset.☆36Updated 3 years ago
- ☆120Updated last year
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)☆41Updated 3 years ago
- ☆74Updated last year
- UBC ARBERT and MARBERT Deep Bidirectional Transformers for Arabic☆104Updated 3 years ago
- CodeBERTScore: an automatic metric for code generation, based on BERTScore☆185Updated last year
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆34Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆73Updated 4 months ago
- ☆158Updated 5 years ago
- Script for downloading GitHub.☆91Updated 8 months ago
- [EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages☆23Updated 2 years ago
- A collection of recent papers, benchmarks and datasets of AI4Code domain.☆57Updated 10 months ago
- The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization☆157Updated 2 years ago
- Repository for analysis and experiments in the BigCode project.☆117Updated 11 months ago
- LogiTorch is a PyTorch-based library for logical reasoning on natural language☆70Updated 6 months ago
- Seq2Seq-based open domain empathetic conversational model for Arabic: Dataset & Model☆57Updated 2 weeks ago
- A Pre-trained BERT on StackOverflow Corpus☆47Updated 4 years ago
- A python package to augment text data using NLP.☆40Updated last month