Abstract
Introduction. The aim of the current paper was to introduce the initial data analyses conducted on the basis of the electronic health records of the patients of general practitioners in Tartu. It is a preliminary study demonstrating the potential of the data analysis of such datasets. Estonian electronic health record databases have not been researched on such a large scale earlier.
Methods. The health data of patients were collected to create a database in 1995–2011 and were examined in an anonymised format using different methods of distribution and frequency analysis.
Results. The majority of visits to the general practitioners were made during the autumn and winter seasons, also Monday was the busiest weekday. The visit records contain a total of 18 462 524 words, 14% of them being abbreviations. Altogether 190 789unique lemmas (base forms) were used in the visit texts; only 78 909 (41.36%) of them were used more than once and 25 437 (13.33%) were used at least10 times. The most frequent words/abbreviations in the visit records were rr, ravi (treatment),x, olema (to be), and mg. The dataset included 5389 different ICD-10 diagnostic codes. The 425 (7.9%) most frequent items accounted for 90% of all diagnoses. The prescriptions comprised 718 different pharmaceutical substances (ATC codes), 144 (20%) most frequent of them accounted for 90% of all prescribed substances.
Conclusion. The results of the study can be used in organizational/regulative decisionmaking in the healthcare area. The existing dataset is extremely large and has a great potential for further analysis, including usage of data mining tools.