| Close Print View | |
| Understanding the Master Index Standardization Engine: Business Patterns File |
|
Understanding the Master Index Standardization Engine
Master Index Standardization Engine Overview
Finite State Machine Framework Configuration
FSM–Based Person Name Configuration
FSM–Based Telephone Number Configuration
Rules–Based Address Data Configuration
Rules-Based Business Name Configuration
Business Name Standardization Overview
Business Name Standardization Components
Business Name Standardization Files
Business Name Adjectives Key Type File
Business Association Key Type File
Business General Terms Reference File
Business City or State Key Type File
Business Former Name Reference File
Merged Business Name Category File
Primary Business Name Reference File
Business Connector Tokens Reference File
Business Country Key Type File
Business Industry Sector Reference File
Business Industry Key Type File
Business Organization Key Type File
Business Name Standardization and Sun Master Index
Business Name Processing Fields
Configuring a Standardization Structure for Business Names
The business patterns file (bizpatterns.dat) defines multiple formats expected from the business name
input fields along with the standardized output of each format. The patterns and
output appear in two-row pairs in this file, as shown below.
4 PNT AST SEP-GLC ORT PNT AST DEL ORT
The first line describes the input pattern and the second describes the output pattern using tokens to denote each component. The supported tokens are described in Business Name Tokens. A number at the beginning of the first line indicates the number of components in the given business name format. You can modify this file using the following syntax.
length input-pattern output-pattern
The following table lists and describes the components in the above syntax.
Business Patterns File Components
|
Below is an excerpt from the business patterns file.
4 PNT AST SEP-GLC ORT PNT AST DEL ORT 4 NFG AJT SEP-GLC ORT PNT PNT DEL ORT 4 NF AJT SEP-GLC ORT PNT PNT DEL ORT 4 CST IDT NF ORT PNT PNT PNT ORT 4 PNT AJT SEP-GLC ORT PNT PNT DEL ORT
The business patterns file uses tokens to denote different components in a business name, such as the primary name, alias type key, URL, and so on. The file uses one set of tokens for input fields and another set for output fields. The tokens indicate the type key files to use to determine the appropriate values for each output field. You can use only the predefined tokens to represent business name components; the standardization engine does not recognize custom tokens.
Table 13 lists and describes each input token; Table 14 lists and describes each output token.
Business Name Input Pattern Tokens
|
Table 14 lists and describes each output token.
Business Name Output Pattern Tokens
|