huybery / Awesome-Code-LLM
π¨βπ» An awesome and curated list of best code-LLM for research.
β1,183Updated 4 months ago
Alternatives and similar repositories for Awesome-Code-LLM:
Users that are interested in Awesome-Code-LLM are comparing it to the libraries listed below
- [TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.β2,422Updated this week
- A framework for the evaluation of autoregressive code generation language models.β932Updated 5 months ago
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024β1,445Updated 3 weeks ago
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.β758Updated 11 months ago
- β649Updated 5 months ago
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...β1,979Updated last month
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.β1,724Updated 3 months ago
- A library for advanced large language model reasoningβ2,099Updated 2 weeks ago
- Awesome-LLM-RAG: a curated list of advanced retrieval augmented generation (RAG) in Large Language Modelsβ1,188Updated 2 months ago
- π OctoPack: Instruction Tuning Code Large Language Modelsβ462Updated 2 months ago
- Benchmarking large language models' complex reasoning ability with chain-of-thought promptingβ2,718Updated 8 months ago
- π° Must-read papers and blogs on LLM based Long Context Modeling π₯β1,447Updated last week
- List of language agents based on paper "Cognitive Architectures for Language Agents"β928Updated 3 months ago
- Evaluate your LLM's response with Prometheus and GPT4 π―β911Updated last month
- [ACL 2023] Reasoning with Language Model Prompting: A Surveyβ952Updated 2 weeks ago
- Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 πβ3,000Updated last month
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)β2,512Updated 2 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,462Updated this week
- LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Stepβ520Updated 7 months ago
- Must-read Papers on LLM Agents.β2,325Updated 2 months ago
- [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGIβ342Updated 2 weeks ago
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,β¦β2,051Updated 11 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,659Updated this week
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.β679Updated 6 months ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".β1,518Updated 3 weeks ago
- Run evaluation on LLMs using human-eval benchmarkβ407Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Modelsβ1,514Updated last year
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.β518Updated 5 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ651Updated 10 months ago
- Reading list of hallucination in LLMs. Check out our new survey paper: "Sirenβs Song in the AI Ocean: A Survey on Hallucination in Large β¦β1,010Updated 5 months ago