Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ على الانترنت اقرأ أكثر حفظ في قائمتي

المؤلفون: Yanchen Bo; Chao Song; Xiu Yang; Xun Shi; Jinfeng Wang
المصدر:
Scientific Reports, Vol 8, Iss 1, Pp 1-13 (2018)
Scientific Reports
الموضوع:
0301 basic medicine; Multidisciplinary; Data collection; Computer science; lcsh:R; lcsh:Medicine; Missing data; Article; Computer Science::Computers and Society; Random forest; 03 medical and health sciences; 030104 developmental biology; 0302 clinical medicine; Statistics; Singular value decomposition; Covariate; Bayesian hierarchical modeling; lcsh:Q; 030212 general & internal medicine; Imputation (statistics); lcsh:Science
اللغة:
English
الدخول الالكتروني :
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fc2ee1c14b54e80190d8b041f0cb4e8c
http://link.springer.com/article/10.1038/s41598-018-28322-z

معلومة اضافية
- بيانات النشر:
  Nature Publishing Group, 2018.
- الموضوع:
  2018
- نبذة مختصرة :
  Due to a large number of missing values, both spatially and temporally, China has not published a complete official socioeconomic statistics dataset at the county level, which is the country’s basic scale of official statistics data collection. We developed a procedure to impute the missing values under the Bayesian hierarchical modeling framework. The procedure incorporates two novelties. First, it takes into account spatial autocorrelations and temporal trends for those easier-to-impute variables with small missing percentages. Second, it further uses the first-step complete variables as covariate information to improve the modeling of more-difficult-to-impute variables with large missing percentages. We applied this progressive spatiotemporal (PST) method to China’s official socioeconomic statistics during 2002–2011 and compared it with four other widely used imputation methods, including k-nearest neighbors (kNN), expectation maximum (EM), singular value decomposition (SVD) and random forest (RF). The results show that the PST method outperforms these methods, thus proving the effects of sophisticatedly incorporating the additional spatial and temporal information and progressively utilizing the covariate information. This study has an outcome that allows China to construct a complete socioeconomic dataset and establishes a methodology that can be generally useful for estimating missing values in large spatiotemporal datasets.
- ISSN:
  2045-2322
- Rights:
  OPEN
- الرقم المعرف:
  edsair.doi.dedup.....fc2ee1c14b54e80190d8b041f0cb4e8c

تعليقات

No Comments.

Estimating missing values in China’s official socioeconomic statistics using progressive spatiotemporal Bayesian hierarchical modeling

اتصل بنا

اتبع