AN ALGORITHM FOR CREATING A SEMI-SYNTHETIC DATASET FOR DIABETES

Akhram Nishanov; Farkhod Mengturayev; Fayzulla Ollamberganov; Uktamjon Allayarov; Malika Khasanova; Gulshan Doniyorova

doi:10.26577/JMMCS1291202610

Authors

Akhram Nishanov Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan https://orcid.org/0000-0002-5652-8977
Farkhod Mengturayev Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan https://orcid.org/0000-0003-0562-8377
Fayzulla Ollamberganov Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan https://orcid.org/0000-0001-6450-0467
Uktamjon Allayarov Termez Branch of the Tashkent Medical Academy, Termez, Uzbekistan https://orcid.org/0009-0003-4748-6853
Malika Khasanova Tashkent Medical Academy, Tashkent, Uzbekistan https://orcid.org/0000-0002-3128-9367
Gulshan Doniyorova Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan https://orcid.org/0009-0004-1826-524X

DOI:

https://doi.org/10.26577/JMMCS1291202610

Keywords:

diabetes prediction, semi-synthetic dataset, Data augmentation, machine learning algorithms, synthetic medical data, generative model, object similarity

Abstract

Recent advances in the areas of artificial intelligence and machine learning have opened up new avenues for enhancing the practice of medical diagnosis. However, researchers face difficulties in accessing quality datasets because of the sensitive nature of real clinical data related to diabetes mellitus. The main objective of this research is to introduce an algorithm intended to generate a semi-synthetic training dataset aimed at improving classification accuracy for diabetes mellitus, particularly type 1 and type 2 diabetes. An algorithm to generate semi-synthetic diabetes data by statistically analyzing clinical attributes from real patient records. For improving the generation of synthetic samples without altering the properties of the original data, a similarity-based approach focusing on class-object relations was used. The approach used successfully generated synthetic data instances that preserved the inherent structure and distribution typical of real patient data. A similarity-based mechanism ensured the relevance of the created instances, while the study outlined a sequence of steps intended to improve the quality of synthetic datasets. The proposed algorithm creates artificial datasets for diabetes classification with patient data protection. This methodology led to the rise in intra-class similarity from 76.18% to 82.93%, which in turn enhanced the diagnostic accuracy of artificial intelligence-based models.

Author Biographies

Akhram Nishanov, Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan

Nishanov Akhram Khasanovich (corresponding author) – DSc, professor of the Faculty of Software engineering of Tashkent University of Information Technologies named after Muhammad Al-Khwarizmi (Tashkent, Uzbekistan, email: nishanov_akram@mail.ru)
Farkhod Mengturayev, Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan

Mengturayev Farhod Ziyatovich – Senior lecturer of the Department of Information Technology Denau Institute of Entrepreneurship and Pedagogy (Denau, Uzbekistan, f.mengtoraev@dtpi.uz)
Fayzulla Ollamberganov, Tashkent university of information technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan

Ollamberganov Fayzulla Farxod o’g’li – PhD student Department of System and applied programming of Tashkent University of Information Technologies named after Muhammad AlKhwarizmi (Tashkent, Uzbekistan, email: fayzulla0804@gmail.com)
Uktamjon Allayarov, Termez Branch of the Tashkent Medical Academy, Termez, Uzbekistan

Allayarov Uktamjon Bektashovich – Department of propedeutics of internal diseases, rehabilitation, Ethnoscience and Endocrinology, Termez Branch of the Tashkent Medical Academy (Termez, Uzbekistan, email: criptolione7777@gmail.com)
Malika Khasanova, Tashkent Medical Academy, Tashkent, Uzbekistan

Khasanova Malika Akhramovna – Teaching Assistant of the Department of Hospital Therapy of Faculty No. 2 of Tashkent Medical Academy (Tashkent, Uzbekistan, email: malikabonuxasanova@gmail.com)
Gulshan Doniyorova, Denau Institute of Entrepreneurship and Pedagogy, Denau, Uzbekistan

Doniyorova Gulshan Toshmirzayevna – Teacher of of the Department of Information Technology Denau Institute of Entrepreneurship and Pedagogy (Denau, Uzbekistan, email: gulshandoniyorova68@gmail.com)

AN ALGORITHM FOR CREATING A SEMI-SYNTHETIC DATASET FOR DIABETES

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

How to Cite

Language

Information

Links