Lasso principal component analysis process monitoring

11/11/2023

The difference between them is shown in Figure 1. They refer to two basic approaches: instance reduction and feature reduction. For this reason, a wide range of methods for data complexity reduction are considered. Even though collecting greater number of data may contribute to a comprehensive diagnosis, it requires more resources related to storage, processing and increased computing costs. The process of extracting information from these huge datasets, which is essential for a given treatment, becomes more and more complex, and requires analysis for different types of data. The development of computer technologies and their wide use in medicine give a significant increase in medical data repositories. Finally, in Section 6 (Conclusions) we summarize our research and describe further works. We introduce data characteristic and discuss the results.

In Section 5 (Experimental results and discussion) we describe the studies that were conducted. Section 4 describes the medical problem of arrhythmia and a dataset resulting from ECG recordings. Section 3 (Method overview) presents the proposed procedure for in-group principal component analysis. It also presents literature review introducing feature selection techniques based on data characteristics. Section 2 (Background) describes the problem of feature cardinality reduction in terms of data intrinsic characteristic. The remainder of this paper is organized as follows. However, the proposed method is to be applied for a real dataset of similar structure. In the research we use a reference “ARRHYTHMIA” dataset, derived from the UCI repository. We applied our method to reduce data derived from ECG signals to improve storage and inference process in solving arrhythmia classification problem. We compare the performance of the considered algorithm in arrhythmia classification with accuracy results attained for an original set of features and for a dataset passed through unchanged PCA. It introduces the preprocessing phase that arranges the related features into groups of similar distribution. In this research, we propose a modification to the application of PCA method called igPCA (in-group Principal Component Analysis). Dataset cardinality reduction can be achieved by sampling, grouping or instance selection methods. The dimensionality reduction can be carried out through statistical methods, primarily Principal Component Analysis (PCA) or by using feature selection techniques. Reduction of large datasets can be performed by reducing the number of analyzed parameters (dimensions) or by decreasing the number of analyzed cases. Therefore it is often called "the curse of dimensionality". This is mainly due to the limitations imposed by the performance of computer systems, but also because of the methodological problems inherent in multidimensional data analysis. Automated knowledge extraction from massive data and the medical inference based on data analysis pose highly complex issues.

The progress in health technologies’ development and the growing capabilities of diagnostic equipment cause the process of medical analysis and diagnosis highly challenging due to large and multidimensional datasets.

Experiment results showed the advantage of the presented method compared to base PCA approach. The obtained effects have been evaluated regarding the number of kept features and classification accuracy of arrhythmia types. The method has been verified by experiments done on a dataset of ECG recordings. The presented method transforms the feature space into a lower dimension and gives the insight into intrinsic structure of data. We assume that the set of attributes can be split into subgroups of similar characteristic and then subjected to principal component analysis. In the paper the new igPCA (in-group Principal Component Analysis) method for feature reduction is proposed. Feature reduction not only enables saving storage and computing resources, but it primarily makes the process of data interpretation more comprehensive. However, their storage, analysis and knowledge extraction become highly complex issues. It is usually diagnosed by measuring the heart activity using electrocardiograph (ECG) and collecting the data as multidimensional medical datasets. One of the most common heart diseases is cardiac arrhythmia. Due to the growing problem of heart diseases, the computer improvement of their diagnostics becomes of great importance.

0 Comments

Lasso principal component analysis process monitoring

Leave a Reply.

Author

Archives

Categories