LLM-based Test Generation - Does It Work?

Is it possible to effectively find bugs using tests that have been generated by LLMs? Or do you rather (unintentionally) validate faulty code? Given that there the number of LLM-supported coding and testing tools are currently exploding, we will replay an experiment laid out in a paper by Mathews & Nagappan (2024

Objective
Background
Research Question(s)
Sources

Objective

Given the discruptive role of AI in software development, this topic explores the question if LLM-based test generation really makes sense.

Background

Base of this topic is the paper by Mathews & Nagappan (2024), see below. Essentially, the team should replay the experiments (as far as possible), and back them up by research to come to its own conclusion.

Research Question(s)

Does LLM-based test generation really make sense?
If yes, what needs to be taken care of?

Sources

Mathews & Nagappan (2024): Design choices made by LLM-based test generators prevent them from finding bugs