Is it possible to effectively find bugs using tests that have been generated by LLMs? Or do you rather (unintentionally) validate faulty code? Given that there the number of LLM-supported coding and testing tools are currently exploding, we will replay an experiment laid out in a paper by Mathews & Nagappan (2024
Given the discruptive role of AI in software development, this topic explores the question if LLM-based test generation really makes sense.
Base of this topic is the paper by Mathews & Nagappan (2024), see below. Essentially, the team should replay the experiments (as far as possible), and back them up by research to come to its own conclusion.