As we all know, English is based on words, and words are separated by spaces. For example, Thank you. Computers can easily identify you as a word by spaces. Chinese, on the other hand, takes characters as units and expresses meaning through sentences composed of characters. For example, Chang'an Street, Dongcheng District, Beijing, it is difficult for the computer to know that Chang'an Street is a word that cannot be split, so we need to create a Dictionary Library (*.dct). The Dictionary Library stipulates the rules of Chinese word segmentation. Word segmentation is to divide the sequence of Chinese characters into meaningful words, and the Dictionary Library is a collection of these meaningful words. Match Address is based on the dictionary database, and the accuracy of Chinese word segmentation directly affects the correctness of Search Results and the ranking of relevance. For example:
- Chang'an Street, Dongcheng District, Beijing: If there are words such as "Beijing", "Dongcheng District" and "Chang'an Street" in the address dictionary, Chang'an Street, Dongcheng District, Beijing will be divided into "Beijing/Dongcheng District/Chang'an Street". Only by entering one or this of the words "Beijing" or "Dongcheng District" or "Chang'an Street" as a keyword to search, "Beijing Dongcheng District Chang'an Street" can be searched out.
- Kimono: In a good address dictionary, "kimono" should be used as a word that cannot be further divided, so that Search Results is related to clothing such as "kimono", such as "Japanese kimono industry"; If the word "kimono" in the address dictionary is divided into two words "and" and "service", the Search Results will show incorrect results such as "products and services". Similar words include place names, person names, unit names, commodity names and so on. For example, "Beijing", "Xu Zhimo", "Chaotu", "Coca-Cola" and so on.
SuperMap iDesktopX provides the management of address dictionary database, including adding, deleting and modifying words in the address dictionary. It also supports the mutual conversion between text files (*.txt) and address dictionary files (*.dct), and can merge multiple address dictionary files.
- Support the creation of address dictionary library, which can be created by adding words one by one or importing Text File (*.txt);
- Support editing operations such as adding or deleting the existing dictionary library;
- Support exporting address dictionary library, or exporting to Text File (*.txt).
Operating instructions
Function Entry
- Traffic Analysis tab-> Geocoding-> Dictionary Library;
In the Dictionary Library dialog box, you can specify to edit the dictionary library and the Import Export dictionary library in the toolbar. The left side of the dialog box displays the entry list of the current default dictionary library, and the right side is the new entry editing input box.
- Set the dictionary library (.dct) file, add, delete, modify and query the entries. By default, read the default dictionary library file under the root directory of the product package:. \ support \ Geocoding \ DefaultDictionary. DCT. The dictionary library file is empty. You can add entry resources and create a new dictionary library. At the same time, you can specify the existing dictionary library file to delete and modify the dictionary library file.
- New entry: Enter the entry to be added in the text box area on the right, separated by a space or a new line, and then click the "Update Entry" button, or use the shortcut key: Ctrl + Enter to add the entry.
Note: Entry rules cannot start with numbers or English. Duplicate entries will fail to be added.
- Import txt: Read the txt file to quickly add an entry: You can also import the entry in the txt file to the entry input area, and then repeat the "Add Entry" method to add an entry.
- Add, delete, modify and query: In the entry list area on the left side of the dialog box, you can modify an entry (double-click to modify a cell), or select one or more rows to delete.
- New entry: Enter the entry to be added in the text box area on the right, separated by a space or a new line, and then click the "Update Entry" button, or use the shortcut key: Ctrl + Enter to add the entry.
- Import dictionary library (.dct) file: By importing the existing dictionary library file, merge it with the current dictionary library to generate a new dictionary. The new dictionary contains all the entries in the two dictionaries before merging. If there are the same entries in the two dictionaries before merging, they will be merged.
- Export dictionary library (.dct) file: Export the contents of the currently displayed dictionary library as a DCT file;
- Export txt: Export the contents of the current display dictionary library as a txt file.
Conversion between
dictionary database and property table
Support the mutual conversion of dictionary library files (*.dct) and Tabular Dataset;
Dictionary Library to Attribute Table
Because the dictionary library file is saved in binary system, it is not enough to intuitively understand the content of the dictionary library. It is more intuitive to modify the dictionary file in the converted attribute table through the Dictionary Library to Attribute Table function.
Function Entry
- Traffic Analysis tab-> Geocoding-> Dictionary Library-> Dictionary Library-> Attribute Table;
- Toolbox-> Geocoding-> Dictionary Library-> Attribute Table;
Parameter Description
- Dictionary Library File: Select a dictionary library file.
- Result Data: The Datasource where the result Tabular Dataset is located. Set the name of the Result Dataset. The default name is result _ Dictionary ToTabular.
Attribute Table to Dictionary Library
Function Entry
- Traffic Analysis tab-> Geocoding-> Dictionary Library-> Attribute Table-> Dictionary Library;
- Toolbox-> Geocoding-> Attribute Table-> Dictionary Library;
Parameter Description
- Source data: Select Tabular Dataset in Current Workspace.
- Entries field: The field in the property sheet that is used for word segmentation.
- Word frequency field: It is used to record the number of times this entry is used. Words with high word frequency will be filtered out as entries first.
- Result Data: Set the storage path of the specified address dictionary file and the file name of the dictionary library.