Point Trajectory Similarity Measurement

Feature Description

Point trajectory dataset similarity measurement involves using a search dataset to find the most similar trajectory points from a trajectory dataset compared to the search trajectory.

Trajectory similarity is a crucial metric for moving object analysis. Comparing similarity between trajectories is one of the most fundamental methods for analyzing trajectory data and mining hidden information, playing a significant role across various fields of trajectory computing. For instance, in traffic management, similarity measurement can identify concentrated areas within trajectory sets, infer traffic congestion, and facilitate early traffic control. In urban planning, discovering similarities in human activities can help deduce functional modules of a city, aiding urban development. In intelligent recommendation systems, finding similar activity trajectories that satisfy certain spatiotemporal constraints can provide personalized recommendations for users, enhancing user experience and loyalty. In smart travel, recommending similar trajectories to users enables rational trip planning, making intelligent travel possible. In environmental air quality prediction, comparing with historical air trajectory data, combined with meteorological and traffic flow information, allows for predicting air quality in various regions, supporting environmental protection efforts.

This method can be used during pandemics to quickly identify individuals who have had spatiotemporal contact with confirmed cases by analyzing large-scale trajectory data against the confirmed individuals' movement tracks.

The result returns the most similar trajectory data to the search trajectory. The result dataset will retain all property fields of the trajectory data and will also add "QueryIdentityID" and "Similarity" fields. The "QueryIdentityID" field indicates the identity field of the search trajectory. The "Similarity" field represents the similarity or distance between trajectories. All point objects belonging to a single trajectory queried from the same search object share the same similarity value.

Provided trajectory similarity measurement methods include:

  • Hausdorff Distance: A shape-based measurement method that determines similarity by calculating the maximum of the closest point distances between two trajectories, with the condition that the number of points in the two trajectories should not differ significantly. A smaller Hausdorff distance indicates greater similarity between trajectories.
  • Frechet Distance: Based on the idea of trajectory dynamic programming, similar to the dog leash distance, it determines similarity by calculating the longest distance between corresponding positions of two trajectories at the same time, and is sensitive to noise.
  • Dynamic Time Warping (DTW): Calculates similarity between two time series by extending and shortening the time series. It has no restrictions on trajectory length but is sensitive to noise.
  • Max Similar Length: Determines similarity by calculating the shortest distance between two trajectories within specified spatial and temporal tolerances. This method incorporates time constraints, only searching for trajectories within the same time period as the search trajectory, whereas the other measurement methods consider all points in the trajectories. A smaller result value indicates greater trajectory similarity.

Parameter description

Parameter Name Default Value Parameter Interpretation Parameter Type
Point Trajectory Dataset   The trajectory dataset from which to find the most similar trajectory points to the search trajectory. Must be point data. FeatureRDD
Search Dataset   The reference dataset used to find the most similar trajectory points from the trajectory dataset. Must be point data. FeatureRDD
Trajectory Identity Field in the Trajectory Dataset   The trajectory identity field in the trajectory dataset. Points with the same identifier are grouped into one trajectory, e.g., phone number, license plate number, etc. String
Time Field in the Trajectory Dataset   The field in the trajectory dataset that marks the time value for each trajectory point. String
Trajectory Identity Field in the Search Dataset   The trajectory identity field in the search dataset. Points with the same identifier are grouped into one trajectory, e.g., phone number, license plate number, etc. String
Time Field in the Search Dataset   The field in the search dataset that marks the time value for each trajectory point. String
Trajectory Similarity Measurement Method
(Optional)
Max Similar Length The trajectory similarity measurement method. Refer to the feature description. JavaSimilarityAlgorithm
Number of Most Similar Trajectories to Return
(Optional)
5 The number of most similar trajectories to return. Must be greater than 0. Int
Spatial Distance Tolerance
(Optional)
50 Meters If the trajectory measurement method is Max Similar Length, the spatial distance tolerance represents the maximum allowable distance error between two points, meaning if the distance exceeds this value, similarity is impossible. For other trajectory measurement methods, it represents the error tolerance for the minimum bounding rectangle of trajectory objects. If the minimum bounds of two trajectories intersect within this tolerance, the distance between trajectories is calculated; otherwise, it is not. JavaDistance
Time Tolerance
(Optional)
  Time tolerance. Two location points can be considered similar only if their timestamps intersect within this time tolerance. When time tolerance is active, only the "Max Similar Length" measurement method is supported. Parameter input must include duration length and unit. Units include: Seconds, Minutes, Hours, Days, Weeks, Months, Years JavaDuration