Explainability and Interpretability in Generative AI Agents

Authors

  • Anantharaman Janakiraman

Abstract

The rapid advancement of generative artificial intelligence (AI) agents, including large language models, multimodal systems, and autonomous decision-making agents, has significantly expanded their adoption across critical domains such as healthcare, finance, education, and governance. While these systems demonstrate remarkable capabilities in natural language generation, reasoning, and task automation, their increasing autonomy and complexity have raised substantial concerns regarding transparency, accountability, and trustworthiness. This study examines the critical role of explainability and interpretability in generative AI agents, emphasizing their importance for enabling human understanding, regulatory compliance, and ethical deployment. The abstract highlights how explainable AI (XAI) techniques contribute to making the internal decision processes and output generation mechanisms of generative agents more transparent, thereby supporting responsible human-AI collaboration and informed oversight. The abstract further considers emerging methods specifically designed for large language models and generative agents, including chain-of-thought prompting, rationale generation, uncertainty estimation, and self-explanation mechanisms, which aim to provide more faithful and context-aware explanations.

References

1. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., … Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115.

2. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.

3. Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(310), 1–9.

4. Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 31–57.

5. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144).

6. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765–4774).

7. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

8. Jain, S., & Wallace, B. C. (2019). Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the ACL (pp. 3543–3556).

9. Wiegreffe, S., & Pinter, Y. (2019). Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (pp. 11–20).

10. Thirunavukarasu, A. J., Ting, D. S. W., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. J. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930–1940.

11. Meskó, B., & Topol, E. J. (2023). The imperative for regulatory oversight of generative AI in healthcare. NPJ Digital Medicine, 6(120), 1–4.

12. Cutillo, C. M., Sharma, K. R., Foschini, L., Kundu, S., Mackintosh, M., & Mandl, K. D. (2020). Machine intelligence in healthcare—Perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digital Medicine, 3(47), 1–5.

13. Molnar, C. (2022). Interpretable machine learning: A guide for making black box models explainable (2nd ed.). Leanpub.

14. Gunning, D., & Aha, D. (2019). DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, 40(2), 44–58.

15. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1–42.

16. Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 4793–4813.

17. Samek, W., Wiegand, T., & Müller, K. R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.

18. Holzinger, A., Biemann, C., Pattichis, C. S., & Kell, D. B. (2017). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.

19. Mohsin, M. T., & Nasim, N. B. (2025). Explaining the unexplainable: A systematic review of explainable AI in finance. arXiv preprint arXiv:2503.05966.

20. De Silva, C., Halloluwa, T., & Vyas, D. (2025). A multi-layered research framework for human-centered AI: Defining the path to explainability and trust. arXiv preprint arXiv:2504.13926.

Downloads

Published

2025-05-03

Issue

Section

Articles