As we know, English uses spaces to separate words (e.g., "Thank you."), making it easy for computers to recognize individual words. However, Chinese uses characters to form sentences. For example, in "北京市东城区长安街", computers cannot inherently recognize "长安街" as an indivisible term. This necessitates the creation of a Dictionary Library (*.dct), which defines Chinese word segmentation rules. Segmentation divides Chinese character sequences into meaningful words, and the dictionary library is the collection of these meaningful terms. The accuracy of Chinese segmentation directly affects the correctness and relevance ranking of search results in address matching. Examples:
- 【北京市东城区长安街】: If the dictionary contains "北京市", "东城区", and "长安街", this address will be segmented as "北京市/东城区/长安街". Only by searching with keywords like "北京", "东城区", or "长安街" can this address be retrieved.
- 【和服】: In a well-designed dictionary, "和服" should be treated as an indivisible term, yielding search results related to clothing (e.g., "日本和服产业"). If segmented into "和" and "服", incorrect results like "产品和服务" may appear. Similar examples include common place names (e.g., "北京"), personal names (e.g., "徐志摩"), company names (e.g., "超图"), and product names (e.g., "可口可乐").
SuperMap iDesktopX provides dictionary library management features, including adding, deleting, and modifying terms. It supports mutual conversion between text files (*.txt) and dictionary files (*.dct), and can merge multiple dictionary files.
- Create dictionary libraries by adding terms individually or importing text files (*.txt).
- Edit existing libraries through addition or deletion of terms.
- Export libraries as dictionary files (*.dct) or text files (*.txt).
Feature Entry
- Transport Analysis Tab -> Geocoding -> Dictionary Library;
In the Dictionary Library dialog, the toolbar allows editing and import/export operations. The left panel displays the current default dictionary entries, while the right panel provides an input box for new entries.
- Set dictionary (.dct) file: Add, delete, modify, or query entries. By default, it reads the empty dictionary file at: ..\support\Geocoding\DefaultDictionary.dct. You can add entries to create new libraries or modify existing ones.
- Add entries: Input new terms in the right text box, separated by spaces or line breaks. Click "Update Entries" or use Ctrl+Enter to confirm.
Note: Terms cannot start with numbers or English letters. Duplicate entries will fail to add.
- Import txt: Import terms from a txt file. The imported terms will appear in the input area for further processing via the "Add Entries" method.
- Edit entries: Modify entries by double-clicking cells in the left list. Select one or multiple rows to delete.
- Add entries: Input new terms in the right text box, separated by spaces or line breaks. Click "Update Entries" or use Ctrl+Enter to confirm.
- Import dictionary (.dct) file: Merge an existing dictionary with the current one. Duplicate entries will be combined.
- Export dictionary (.dct) file: Export current entries as a dct file.
- Export txt: Export current entries as a txt file.
Dictionary-Tabular Conversion
Supports mutual conversion between dictionary files (*.dct) and tabular datasets.
Dictionary Library to Attribute Table
Since dictionary files are saved in binary format, converting them to attribute tables allows more intuitive editing.
Feature Entry
- Transport Analysis Tab -> Geocoding -> Dictionary Library -> Dictionary Library -> Attribute Table;
- Toolbox -> Geocoding -> Dictionary Library -> Attribute Table;
Parameter Description
- Dictionary file: Select a dictionary file.
- Result Data: Specify the target datasource and dataset name (default: result_DictionaryToTabular).
Attribute Table to Dictionary Library
Feature Entry
- Transport Analysis Tab -> Geocoding -> Dictionary Library -> Attribute Table -> Dictionary Library;
- Toolbox -> Geocoding -> Attribute Table -> Dictionary Library;
Parameter Description
- Source Dataset: Select a tabular dataset from the current workspace.
- Term field: Field containing segmentation terms.
- Frequency field: Field recording term usage frequency (higher-frequency terms are prioritized).
- Result Data: Set the output path and filename for the dictionary file.
Related Topics