Offered Subjects
Offered Theses
- Putting the Doctor in the Loop: Improving AI Recommendations for Cancer Treatment Decisions with Explainable AI
Bachelor Thesis Business Information Systems, Tutor: M.Sc. Luca GemballaTreatment Effect Prediction (TEP) emphasizes that merely predicting the outcome of a treatment decision instead of both scenarios – under treatment and under control – is a critical flaw in many machine learning (ML) systems. Although true individual treatment effects (ITE) cannot be directly evaluated, as only one decision outcome can be observed for each patient, TEP has sought to develop techniques that allow for such predictions and their evaluation, e.g., through aggregated benefits of medical outcomes when following model recommendations (Pan et al., 2024). While explainable artificial intelligence (XAI) methods are being used to interpret these models, XAI sees little use when it comes to improving them. For an image classification model, e.g., Ribeiro et al. (2016) showed that Local Interpretable Model-Agnostic Explanations (LIME) were suitable to identify spurious correlations in their training data. The explanation showed, that background information, snow in this case, had been the classifier’s focus instead of the animals shown on the images. To the best of our knowledge, no comparable efforts related to TEP have been made so far. Another possible approach to model improvement could be to remove less important features from the training data (Nauta et al., 2023). On top of attempting to improve older and more recent TEP-based ML models, applying XAI may also help us to understand differences between the different models’ internal reasoning.
In this thesis project, the student will develop three TEP-based ML models (Pan et al., 2024) for cancer patients using data from the Surveillance, Epidemiology, and End Results (SEER) database. The models should include the established causal forest (Athey & Wager, 2019) and the more recent SNB model (Pan et al., 2024). First, a scoping review on ML models for TEP and their evaluation will be conducted. After the implementation and training of the models, the student will apply three different XAI methods to generate explanations for each model and analyze these to find potential for adjustments to the training pipeline. These XAI methods should include Shapley Additive Explanations (SHAP) (Lundberg & Lee, 2017) and Diverse Counterfactual Explanations (DiCE) (Mothilal et al., 2020). Interviews with oncologists will be conducted to find flaws of the models through their explanations and improve upon the model by accounting for their input. The interviews will be recorded, transcribed, and analyzed (e.g., via tools like MAXQDA).
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
Pan, H., Wang, J., Shi, W., Xu, Z., & Zhu, E. (2024). Quantified treatment effect at the individual level is more indicative for personalized radical prostatectomy recommendation: implications for prostate cancer treatment using deep learning. Journal of Cancer Research and Clinical Oncology, 150(2), 67.
Athey, S., & Wager, S. (2019). Estimating treatment effects with causal forests: An application. Observational studies, 5(2), 37-51.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
Mothilal, R. K., Sharma, A., & Tan, C. (2020). Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency (pp. 607-617).
Nauta, M., & Seifert, C. (2023). The co-12 recipe for evaluating interpretable part-prototype image classifiers. In World conference on explainable artificial intelligence (pp. 397-420). Cham: Springer Nature Switzerland.
- Acting on Anomalies or Not? Integrating Sentiment Analysis into AI-based Financial Anomaly Detection for Trustworthy Insights
Master Thesis Business Information Systems, Tutor: M.Sc. Luca GemballaIn a fast-paced context like stock trading, decisions are made with distinct considerations. Anomaly detection systems based on machine learning (ML) may provide indications for unusual market behavior, but interpreting how to translate a known anomaly into action is a different story. To achieve this, practitioners put high value on real-world context. News headlines, company statements, and political discussions can be quintessential and insightful when trying to interpret a detected anomaly. Sentiment analysis, among other text analysis techniques, shows promise in this regard (Sufi & Alsulami, 2021; Cruz et al., 2023). A trustworthy sentiment analysis component in an anomaly detection system could help stock traders in filtering information, focusing on the most relevant news pieces, and deciding on a reasonable course of action. Trustworthiness in this context means, that human oversight is ensured through increased transparency, and that users understand the system enough to identify when it is mistaken or its outputs should not be acted on.
In a previous project, a set of design principles for a stock market anomaly detection system has been derived, involving the demand for news integration. Based on this design knowledge, a systematic literature review (SLR) on sentiment analysis, and a preliminary round of interviews for further requirements elicitation the student will develop an anomaly detection system for stock market data that integrates news sentiment analysis in real time. This system will be evaluated with additional experts in think-aloud sessions. The interviews and think-alouds will be recorded, transcribed, and analyzed (e.g., via tools like MAXQDA).
Sufi, F. K., & Alsulami, M. (2021). Automated multidimensional analysis of global events with entity detection, sentiment analysis and anomaly detection. IEEE Access, 9, 152449-152460.
Cruz, R., Kinyua, J., & Mutigwe, C. (2023). Analysis of Social Media Impact on Stock Price Movements Using Machine Learning Anomaly Detection. Intelligent Automation & Soft Computing, 36(3).
- Your Patient, Your Question, Your Answer: How RAG Can Keep Clinicians Ahead of the Evidence Curve
Master Thesis Business Information Systems, Tutor: M.Sc. Luca GemballaOne promise of artificial intelligence (AI) in medicine is to enable learning from the vast amounts of observational data collected at diverse medical institutions. With all the benefits for knowledge generation and more refined decision making this may bring, it leaves a problem of scientific rigor. To a large degree, medical decision making is bounded by guidelines based on evidence from randomized clinical trials (RCT). However, this trial data might not cover each and every combination of patient characteristics, treatment options, and individual circumstance. Moreover, updating guidelines takes time and the wealth of new literature being published is prone to overwhelming practitioners. A solution to these problems could be found in retrieval augmented generation (RAG) for medical studies. Combing such systems with a clinical decision support system (CDSS) would lead to enhanced explainability, and an improved capacity to assess the quality of AI advice.
In this thesis project, the student will develop and implement an RAG system for medical treatment decisions in the domain of gastroenterology in three phases. First, a scoping review of RAG in medicine will be conducted alongside a series of interviews with RAG experts and gastroenterologists for requirements elicitation. Then, the system will be implemented for studies on chronic inflammatory bowel disease. Finally, the student conducts an interview study with gastroenterologists to evaluate the system. The interviews will be recorded, transcribed, and analyzed (e.g., via tools like MAXQDA).
- Quantifying Flow – A Systematic Review and Evaluation of Flow Questionnaires
Master Thesis Business Information Systems, Tutor: M.Sc. Cosima von UechtritzThe theory of flow, which describes the state of being completely absorbed into an activity, has gained considerable attention in recent years across a wide range of domains. As a result, numerous questionnaires have been developed to assess the experience of flow and to account for these different contexts, such as “the flow state scale for occupational tasks” or the “reading flow short scale”. However, the use of a wide variety of questionnaires across studies makes it difficult to compare and synthesize findings and therefore hinders the development of flow research.
Therefore, the aim of this thesis is to systematically identify existing flow questionnaires and assess their methodological validity. Based on this analysis, a framework will be developed to guide the selection and application of flow questionnaires depending on the research context and purpose.
Rosas, D. A., Padilla-Zea, N., & Burgos, D. (2023). Validated questionnaires in flow theory: a systematic review. Electronics, 12(13), 2769.
- Seeing the Heartbeat: AI-based contactless Heart Rate Variability Estimation
Master Thesis Business Information Systems, Tutor: M.Sc. Cosima von UechtritzThe digital health market is increasingly moving from a niche market to a mainstream market, and is expected to grow at an annual growth rate of 5.42% to reach a projected market volume of USD 219.60 billion by 2030 (Statista Market Insights, 2025). Health monitoring is an important sub-segment within the digital health market.
Recent advances in artificial intelligence have significantly improved the accuracy of remote photoplethysmography (rPPG) algorithms. Using these algorithms, heart rate and other vital signs can be measured using a standard RGB camera, enabling completely contactless health monitoring. While heart rate can already be assessed with relatively high accuracy, the reliable extraction of heart rate variability, an important indicator of mental states, remains an ongoing challenge.
Therefore, the aim of this thesis is to develop and validate an AI-based rPPG algorithm for heart rate variability extraction. An open access dataset (e.g., DEAP, MAHNOB-HCI) will be used to train and develop the algorithm. In addition, a small data sample will be collected using a reference measurement device (e.g., ECG chest strap) for validation purposes. The resulting data will then be compared and evaluated using selected performance indicators (e. g. mean absolute error, Pearson correlation coefficient).
- Statista Market Insights (2025). Digital Health. Statista. www.statista.com/outlook/hmo/digital-health/worldwide Retrieved 08.03.2025
- The Academic Version of Apple Health: Build your own Wearable Research App
Master Thesis Business Information Systems, Tutor: M.Sc. Cosima von UechtritzOver the past years, consumer wearables have evolved from lifestyle gadgets into advanced sensing platforms capable of continuous physiological monitoring. Ecosystems like Garmin and Apple Health provide access to health metrics (e.g., sleep score) and raw data (e.g., heart rate) for research institutions. This is of particular importance for researchers interested in predicting diseases (e.g., hypertension) or classifying mental states (e.g., stress or flow), based on physiological data. Among these, flow presents a state of deep task engagement and optimal experience, and can be assessed with physiological indicators, such as heart rate variability (HRV) or respiration rate.
Therefore, the aim of this thesis is to develop a Garmin application that records physiological data and correlates it with user-reported flow experiences. Students will receive access to the Garmin Developer Portal and will implement a mobile application capable of collecting HRV-related metrics and questionnaire-based self-reports. The application will then be evaluated in a small pilot study to explore relationships between health metrics and flow data.
Henriksen, A., Haugen Mikalsen, M., Woldaregay, A. Z., Muzny, M., Hartvigsen, G., Hopstock, L. A., & Grimsgaard, S. (2018). Using fitness trackers and smartwatches to measure physical activity in research: analysis of consumer wrist-worn wearables. Journal of medical Internet research, 20(3), e110.
- What makes a Bird a Bird? Evaluating Prototypes against Feature Attribution Methods in a Bird Classification Task
Bachelor Thesis Business Information Systems, Tutor: M.Sc. Luca GemballaWhile feature attribution methods like SHAP and GradCAM see widespread application for image data, they do not constitute the only class of explanation methods for images. Another category of methods relies on learned prototypes that capture relevant patterns in the images. These prototypes are supposed to be more interpretable than traditional methods for explainable artificial intelligence (XAI) and offer better capacity to detect shortcomings in classification decisions. A typical application of prototype models is the CUB-200 dataset that contains images of 200 different bird species (Nauta et al., 2021).
In this established setting, the student will implement and train two models for bird classification and extract prototype and feature attribution explanations. These will be evaluated through a series of expert interviews with bird enthusiasts to investigate their alignment with human explanations and their understandability for experts in that field. The interviews will be recorded, transcribed, and analyzed (e.g., via tools like MAXQDA).
- Nauta, M., Van Bree, R., & Seifert, C. (2021). Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14933-14943).
- Explaining What’s Relevant: How Doctors Extract Information from Neural Network Explanations
Bachelor Thesis Business Information Systems, Tutor: M.Sc. Luca GemballaAlthough the field of explainable artificial intelligence (XAI) has developed a plethora of explanation methods since its inception, many of these cannot easily be transferred into practical use by non-experts. Having been developed by AI researchers with little attention to how human explanations work, common methods like LIME and SHAP offer insights in a form that does not align with the general expectations and needs of practitioners (Ehsan et al., 2024). Other methods like counterfactuals or narratives however are, at least in theory, closer to human explanations. Still, it is not properly understood how practitioners interpret the various XAI methods and what information they can extract from them. To investigate this problem, this study will look at explanation methods applied to neural networks for treatment outcome prediction in oncology and rheumatology.
The student will implement and train neural networks and three different explanation methods. To evaluate how they are perceived by medical professionals, the student will develop interview guidelines and conduct a series of expert interviews with medical professional from the fields of oncology and rheumatology. The interviews must be recorded, transcribed, and analyzed (e.g., via tools like MAXQDA).
Ehsan, U., Passi, S., Liao, Q. V., Chan, L., Lee, I. H., Muller, M., & Riedl, M. O. (2024). The who in XAI: how AI background shapes perceptions of AI explanations. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (pp. 1-32).
- The Art of Feature Engineering: Comparing Hand-Crafted and Learned Features for Flow State Classification
Master Thesis Business Information Systems, Tutor: M.Sc. Cosima von UechtritzFlow, the state of optimal experience and complete absorption in an activity, is of growing interest in information systems research. Recent studies have shown that flow states can be classified using machine learning models trained on physiological data, such as heart rate and heart rate variability (HRV). For instance, Rissler et al. (2020) trained a flow classifier using a random forest model and achieved an accuracy of 70%. Traditional machine learning approaches often rely on hand-crafted features (HCFs), such as standard HRV metrics like SDNN or RMSSD. However, these features require expert knowledge and are labor-intensive to compute. Feature learning methods, such as deep neural networks, present a promising approach to overcome these limitations due to their capability to automatically extract relevant features. Therefore, feature learning approaches may outperform HCFs, in particular when dealing with large-scale, noisy, or unstructured data.
The aim of this thesis is to investigate the differences between HCFs and feature learning approaches for classifying flow states from physiological signals. Students working on this project will have access to a publicly available flow dataset.
Rissler, R., Nadj, M., Li, M. X., Loewe, N., Knierim, M. T., & Maedche, A. (2020). To be or not to be in flow at work: physiological classification of flow using machine learning. IEEE transactions on affective computing, 14(1), 463-474.