AN ALGORITHM FOR CREATING A SEMI-SYNTHETIC DATASET FOR DIABETES

Authors

DOI:

https://doi.org/10.26577/JMMCS1291202610

Keywords:

diabetes prediction, semi-synthetic dataset, Data augmentation, machine learning algorithms, synthetic medical data, generative model, object similarity

Abstract

Recent advances in the areas of artificial intelligence and machine learning have opened up new avenues for enhancing the practice of medical diagnosis. However, researchers face difficulties in accessing quality datasets because of the sensitive nature of real clinical data related to diabetes mellitus. The main objective of this research is to introduce an algorithm intended to generate a semi-synthetic training dataset aimed at improving classification accuracy for diabetes mellitus, particularly type 1 and type 2 diabetes. An algorithm to generate semi-synthetic diabetes data by statistically analyzing clinical attributes from real patient records. For improving the generation of synthetic samples without altering the properties of the original data, a similarity-based approach focusing on class-object relations was used. The approach used successfully generated synthetic data instances that preserved the inherent structure and distribution typical of real patient data. A similarity-based mechanism ensured the relevance of the created instances, while the study outlined a sequence of steps intended to improve the quality of synthetic datasets. The proposed algorithm creates artificial datasets for diabetes classification with patient data protection. This methodology led to the rise in intra-class similarity from 76.18% to 82.93%, which in turn enhanced the diagnostic accuracy of artificial intelligence-based models.

Author Biographies

  • Akhram Nishanov, Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan

    Nishanov Akhram Khasanovich (corresponding author) – DSc, professor of the Faculty of Software engineering of Tashkent University of Information Technologies named after Muhammad Al-Khwarizmi (Tashkent, Uzbekistan, email: nishanov_akram@mail.ru)

  • Farkhod Mengturayev, Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan

    Mengturayev Farhod Ziyatovich – Senior lecturer of the Department of Information Technology Denau Institute of Entrepreneurship and Pedagogy (Denau, Uzbekistan, f.mengtoraev@dtpi.uz)

  • Fayzulla Ollamberganov, Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan

    Ollamberganov Fayzulla Farxod o’g’li – PhD student Department of System and applied programming of Tashkent University of Information Technologies named after Muhammad AlKhwarizmi (Tashkent, Uzbekistan, email: fayzulla0804@gmail.com)

  • Uktamjon Allayarov, Termez Branch of the Tashkent Medical Academy, Termez, Uzbekistan

    Allayarov Uktamjon Bektashovich – Department of propedeutics of internal diseases, rehabilitation, Ethnoscience and Endocrinology, Termez Branch of the Tashkent Medical Academy (Termez, Uzbekistan, email: criptolione7777@gmail.com)

  • Malika Khasanova, Tashkent Medical Academy, Tashkent, Uzbekistan

    Khasanova Malika Akhramovna – Teaching Assistant of the Department of Hospital Therapy of Faculty No. 2 of Tashkent Medical Academy (Tashkent, Uzbekistan, email: malikabonuxasanova@gmail.com)

  • Gulshan Doniyorova, Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan

    Doniyorova Gulshan Toshmirzayevna – Teacher of of the Department of Information Technology Denau Institute of Entrepreneurship and Pedagogy (Denau, Uzbekistan, email: gulshandoniyorova68@gmail.com)

Published

2026-03-19

How to Cite

AN ALGORITHM FOR CREATING A SEMI-SYNTHETIC DATASET FOR DIABETES. (2026). Journal of Mathematics, Mechanics and Computer Science, 129(1), 114-128. https://doi.org/10.26577/JMMCS1291202610