Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Integrating data and analysis: On bridging data publishers and computational environments

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      10th International Conference on Ecological Informatics- Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World. 24-28 September, 2018. Jena, Germany.
    • الموضوع:
      2018
    • Collection:
      Digital Library Thüringen
    • نبذة مختصرة :
      Prior to analysing data, researchers today need to perform the ‘janitorial’ step of the data life cycle. This step involves cleaning, harmonizing, or integrating data and typically relies on loading data from one or multiple sources into a computational environment and one of its native data structures. Loading data consumes only a small fraction of the estimated 80% of time consumed by the ‘janitorial’ step overall in data analysis. Yet, it is baffling how much effort it can take to load data into a native data structure of a computational environment. What could arguably be as straightforward as providing a DOI to a specialized function that returns the corresponding data (and metadata) represented in a data structure native to the computational environment in reality generally amounts to resolving the DOI using a browser, navigating a landing page to identify data and metadata, download data to a file, and ultimately load the data from the file using one of several specialized functions that read data in one of many file formats. The matter is further complicated by Web APIs that - while easing access and download - generally require prior knowledge for how to retrieve data. Such knowledge needs to be encoded in programming code using the computational environment of choice. Surely the required pieces of technology exist to directly access data given a DOI and negotiate content between data provider and consumer so that the computational environment can automatically load data into a native data structure. Yet we still have some way to go before the subtask of loading data into a computational environment is truly easy. Using PANGAEA as a data publisher and a couple of other data sources, and Jupyter as a computational environment, in this talk we highlight the problem and delineate a solution. Specifically, we will demonstrate how, given a DOI name, PANGAEA data can be automatically loaded into a Python Data Analysis Library (pandas) DataFrame with a mere call of a specialized function. We will also discuss ...
    • File Description:
      22 Seiten
    • Relation:
      ICEI 2018 : 10th International Conference on Ecological Informatics- Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World; https://doi.org/10.22032/dbt.37805; https://www.db-thueringen.de/receive/dbt_mods_00037805; https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00043931/Stocker_S1.4_ICEI2018.pdf
    • الرقم المعرف:
      10.22032/dbt.37805
    • Rights:
      https://creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/openAccess
    • الرقم المعرف:
      edsbas.100C7379