Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D. and Gebru, T. (2019) ‘Model Cards for Model Reporting’, Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220–229.


Their central argument is that trained models should not circulate as opaque technical artefacts, since systems used in medicine, employment, education, law enforcement, content moderation, or facial analysis can produce uneven harms across populations. A model card is therefore conceived as a concise document that accompanies a released model and records its intended uses, unsuitable uses, training and evaluation data, performance metrics, ethical considerations, caveats, and recommendations. Crucially, the authors insist on disaggregated evaluation, meaning that performance should be reported across relevant demographic, cultural, phenotypic, environmenMtal, and intersectional groups rather than hidden inside a single aggregate score. Their case studies make the need clear: the smiling-detector model reveals different error patterns across age and gender, while the toxicity-classifier model shows how systems may unfairly associate identity terms such as “gay”, “lesbian”, or “homosexual” with toxicity unless explicitly evaluated and corrected. In this sense, model cards function like documentary infrastructures for AI governance: they do not solve bias alone, but they make model limitations, risks, and responsibilities visible to developers, organisations, policymakers, users, and affected communities. In conclusion, the article reframes technical documentation as an ethical practice; without structured reporting, machine-learning deployment remains a form of institutional opacity, whereas model cards create conditions for scrutiny, comparison, contestation, and more responsible use.