xlang-ai / arks

☆50

Related projects ⓘ

Alternatives and complementary repositories for arks

qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆52Updated last month
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆57Updated 7 months ago
OSU-NLP-Group / llm-planning-eval
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆49Updated 9 months ago
code-rag-bench / code-rag-bench
CodeRAG-Bench: Can Retrieval Augment Code Generation?
☆84Updated last week
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆115Updated last month
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆56Updated 2 months ago
niansong1996 / lever
Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)
☆79Updated last year
CodeEditorBench / CodeEditorBench
☆39Updated 5 months ago
THUDM / NaturalCodeBench
NaturalCodeBench (Findings of ACL 2024)
☆56Updated last month
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆73Updated 3 months ago
LuLuLuyi / LongHeads
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆28Updated 7 months ago
NL2Code / NL2Code.github.io
Large Language Models Meet NL2Code: A Survey
☆34Updated this week
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆25Updated last week
SalesforceAIResearch / CodeChain
Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"
☆36Updated last year
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆134Updated 3 months ago
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆100Updated 3 weeks ago
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences
☆65Updated 5 months ago
ntunlp / ExecEval
A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.
☆42Updated last month
amazon-science / cceval
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)
☆122Updated 3 months ago
YuxiXie / SelfEval-Guided-Decoding
☆89Updated 11 months ago
ozyyshr / RepoGraph
Enhancing AI Software Engineering with Repository-level Code Graph
☆96Updated 3 months ago
FlagOpen / TACO
☆146Updated 3 months ago
nyu-mll / ILF-for-code-generation
☆75Updated last year
ntunlp / xCodeEval
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
☆74Updated 2 months ago
crux-eval / eval-arena
☆21Updated 3 weeks ago
amazon-science / Repoformer
Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)
☆38Updated 4 months ago
JohnnyPeng18 / APIBench
APIBench is a benchmark for evaluating the performance of API recommendation approaches released in the paper "Revisiting, Benchmarking a…
☆53Updated last year
NingMiao / SelfCheck
Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>
☆45Updated last year
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆104Updated 5 months ago
SparksofAGI / MHPP
☆25Updated last week