This page is still in draft state. Please note that the content may be subject to change.
AI-assisted coding has evolved from simple code completion to autonomous, agentic workflows — but with new capabilities come new security risks. This topic examines whether modern AI coding workflows are compatible with secure software engineering, and which safeguards are needed to use them responsibly.
AI-assisted coding has moved rapidly from simple code completion to much more autonomous workflows that include chat-based coding, repository-wide refactoring, agentic task execution, terminal access, tool integration, and external context retrieval. This raises an important question for professional software engineering: do these workflows improve productivity without undermining core security principles?
(Just to clarify: This topic is NOT about securing software systems that contain an LLM (e.g. a chatbot that processes user input). It is about the security risks that arise when developers use AI tools — especially agentic AI like Claude Code, Cursor, or GitHub Copilot — as part of their development workflow. The question is: does the code that comes out of an AI-assisted process meet the same security standards as code written without AI assistance? And does the development process itself introduce new attack surfaces?
Recent research paints a mixed picture. Studies have found that AI-generated code can contain significantly more vulnerabilities than human-written code — including classic issues like XSS, SQL injection, and insecure defaults. Agentic AI with terminal and file system access adds further risks: secret leakage, unvetted dependency installation, and execution of arbitrary commands. At the same time, developers using AI assistants tend to overestimate the security of the code they produce.
You research what security risks the use of agentic AI introduces into the software development process, and you design a coding experiment that makes these risks — or their absence — observable.
A proof of concept could look like this, for example: You define a set of small but realistic development tasks that each contain a security-sensitive aspect — for instance, implementing user authentication, handling file uploads, building a REST API with input validation, or managing database access. Each team member implements the same set of tasks twice: once using an agentic AI coding companion, and once manually. You then analyze the resulting code for security issues, using tools like static analysis (e.g. Semgrep, SonarQube, or Snyk) and manual code review against a checklist based on the OWASP Top 10 for web applications. You compare the results along dimensions like:
You could also explore whether specific safeguards (e.g. running a SAST tool in the CI pipeline, using CLAUDE.md rules to enforce security practices, or requiring manual review for security-critical files) reduce the risk gap.