3.1 Objects and search criteria
General information mainly includes gender, age, underlying disease, history of contact, etc. Clinical data mainly included early symptoms and signs, Mulbsta score, critical time interval during diagnosis and treatment (including onset of dyspnea, first diagnosis, admission, mechanical ventilation, death and the time between first diagnosis and admission, etc.), laboratory examination, complications and main treatment conditions, and cause of death, etc. The frequency of drug use was counted and an association rule algorithm was used to analyze and study the effect of drug treatment. Through the study and analysis, the results obtained may provide a benchmark for the rational use of drugs by COVID-19 patients, reduce the cost of treating the disease, and reduce the pain caused by the disease in patients. patients.
A retrospective analysis was carried out on 49 cases of COVID-19 deaths diagnosed on January 29, 2020, BBB 0 and March 6, 2020 in our hospital. Inclusion criteria: all patients included met the diagnostic criteria for confirmed cases of the new protocol for the diagnosis and treatment of pneumonia for coronavirus infection (seventh trial edition) published by the National Health Commission on March 3 2020. Clinical manifestations: fever and / or respiratory symptoms; COVID-19 imaging features: several small spots and interstitial changes were present in the early stage, with an evident extrapulmonary area; Further development for double lung frosted glass shade, infiltrating shade, severe cases may appear lung consolidation. Exclusion criteria: excluded successfully treated cases and excluded non-definitively diagnosed cases. The study was non-interventional and did not require patients to sign informed consent.
3.2 Observation indices
According to the course files, the results of laboratory examinations on admission (D1 + 1), 4 + 1 day (D4 + 1), 7 + 1 day (D7 + 1) and 14 + 2 days (D14 +2) were recorded, including routine blood, blood gas analysis, PCT, hypersensitive C-reactive protein (HSCRP), myocardial enzymes, liver enzymes, kidney function, coagulation, electrolytes and etiologic data.
3.3 Data pre-processing
In this study, data was obtained from the regional health information platform from health records. In the final analysis, it belongs to the medical information system, which is closely linked to the real world. In order to improve the efficiency of data mining in data processing, it is necessary to pre-process this data. In the basic personal information data table, in addition to the previous history records, there are other fields unrelated to the search, such as the person who built the file, the date of the file, the medical establishment, etc. In this app, only history records are needed. Therefore, it is not necessary to pre-process these irrelevant fields, only the past history fields are processed.
Second, in the application of this data mining, the main objective is to extract the association rules of COVID-19 complications. Thus, its properties for mining should be various diseases. Therefore, it is necessary to classify the different types of diseases. In storing a person’s illness history data, it is often a personal illness history consisting of multiple illnesses, so it needs to be classified and labeled. For example, in the database, the history column data for “Zhang San” is “hypertension, COVID-19”, indicating that “Zhang San” had previously suffered from hypertension and COVID-19. Therefore, in the information column of “Zhang San”, the column “Hypertension” is marked with “A” and the column of “COVID-19” is marked with “B”.
3.3.1 Data cleaning
The data cleansing process involves removing the noise design in the original data and some data that is not relevant for association rules data mining, and also dealing with missing data. Mainly understands missing data handling and error data handling, and performs some data type conversion work.
Due to the large amount of data in the electronic health records, which is generated in different places, and the complicated process of generation, it is inevitable that there will be data loss, duplication and even erroneous data. The data is therefore cleaned.
Fill in the blank value: because some attributes of a record may be related to some degree of novel coronavirus, but its record is blank, so it must fill in the blank value. The filling of the empty value can be managed by: Ignore record: When some data rows in the data do not have the required class label for their classification, this row can be ignored and the data can be deleted. If the number of tuples missing from a class label is very large, this approach will be difficult to use. Manually fill in missing values: This method compares the cost of time. Especially if the dataset is very large. Global constant fill: this method consists of filling the records for which some of the attributes are missing with a uniform constant. Although this is an easy way to do it, it is not safe. Average Fill: Calculates the average value of an attribute so that records with missing values in that attribute can be filled with that average value.
Edit Error Value: Since a lot of data in the medical information system is entered artificially by medical staff, there are errors in some values, so they need to be changed. The values of data attributes that belong to the canonical standard can be changed by the range standard.
3.3.2 Data conversion
For the original data, after cleaning the data, can not be directly used. You also need to convert some of the attributes into the required form. In the original data, an individual’s age is not stored, only the date of birth is stored. Therefore, the age of an individual will be determined based on the date of birth and the date of filing. But the format of these two dates is not the same in some recordings, some use “year – month – day” format and some use “year – month – day” format, in order to manage convenience, all use ” year format, month, day ”; The age of an individual is then calculated from the difference between the date of birth and the date of filing. The calculated age belongs to the continuous attribute, which is not good for the classification of discrete attributes. It is therefore necessary to discretize. The transformation of age attributes is presented in Table 1.
|Coding of age level||Interval|
3.4 Construction of the association rules database
- Regional Health Information Platform Database: This is a basic database for storing health records, and its information is sourced from medical institutions at all levels. It includes basic personal information, physical examination information, maternal and child health information, as well as disease control, disease management and medical service information content.
- Data Extraction and Processing: The regional health information platform database health archive database is extracted into the data warehouse according to the subject content of the data warehouse. At the same time, non-standardized data must be processed. This process is called ETL processing. That is, we can write the corresponding handler to process the data as needed, or we can load and extract the data through the ETL tool of SSAS.
- Data warehouse: Health archive data stored for many years is the underlying database of the decision support system. The data is aggregated by subject. The data warehouse is a multidimensional database, which is divided into a fact table and a dimension table. Decision makers can analyze and observe the fact table data across dimensions, which is conducive to statistical analysis and allows decision makers to analyze the data from multiple perspectives.
- Data mining application interface: Online analysis and processing of data in the data warehouse and data mining and analysis of association rules. The results are stored in the knowledge base for decision support. For example, the decision tree prediction results can be displayed at this level. After building the data mining model, the display interface can be exploited and the decision maker can enter the corresponding attribute value and then predict it.