Preservation and Application of Data Lineage

Instructions for use

Data Lineage refers to a relationship similar to the blood relationship of human society formed between data in the process of generation, processing and circulation of data. It is a concept in data governance. Simply speaking, it expresses how the data comes from and what processing and analysis processes the data has undergone. Knowledge Graph provides solutions for the extraction, storage, query and visualization of Data Lineage.

SuperMap provides the ability to write the Data Lineage relationship of the Geo-Processing Automation (GPA) (GPA) execution into the Graph Database, Through the constructed data map, the flow process of Data Processing can be visually displayed, and the ability to trace upward and downward to Dataset can be provided.

  • The production link of the Dataset to be queried can be obtained by tracing back to the source. When the data has quality problems, it can help explain the source and locate the cause.
  • The Dataset to be queried is traced down to obtain the subsequent impact of the data.

The Data Lineage of Geo-Processing Automation (GPA) is saved to Graph Database, that is, the Geo-Processing Automation (GPA) tool, Dataset, and attributes are used as graph entities. The process of data processing is taken as the atlas relationship. As shown in the figure below, the Input Data: set is the precipitation data, which is processed by the update column to obtain the new precipitation Dataset, in which the graph nodes are the input and output data, as well as the GPA tool, and the relationship is input and output.

Operation steps

  1. Enable Data Lineage:
    • Geo-Processing Automation (GPA) tab-> Data Lineage->: Save Data Lineage, when enabled, the execution process can be written to the Graph Database.
    • Directly select Dataset in Workspace Manager-> Right click-> Data Lineage-> Save data lineage.
  2. If there is no Connect Graph Database after checking the Save Data Lineage, you need to connect it before saving and applying the Data Lineage.
  3. Execute the Geo-Processing Automation (GPA) (GPA) model.
  4. To perform a consanguineous query or exploration on the executed Result Dataset:
    • Knowledge Graph tab-> Graph Query with The cypher query .
    • Select Dataset-> Right click-> Data Lineage-> Traceability/Trace directly in Workspace Manager.
    • After obtaining the map of Open Data, you can click the Node Properties of the data entity in the Graph Window, which includes the data address, record number, coordinates, range, execution time and other attributes. The change of data attributes in the link can assist in exploring the change of data.