Close Print View
Understanding the Master Index Standardization Engine: Business Patterns File
 

Understanding the Master Index Standardization Engine

Master Index Standardization Engine Overview

Finite State Machine Framework Configuration

FSM–Based Person Name Configuration

FSM–Based Telephone Number Configuration

Rules–Based Address Data Configuration

Rules-Based Business Name Configuration

Business Name Standardization Overview

Business Name Standardization Components

Business Name Standardization Files

Business Name Adjectives Key Type File

Business Alias Key Type File

Business Association Key Type File

Business General Terms Reference File

Business City or State Key Type File

Business Former Name Reference File

Merged Business Name Category File

Primary Business Name Reference File

Business Connector Tokens Reference File

Business Country Key Type File

Business Industry Sector Reference File

Business Industry Key Type File

Business Organization Key Type File

Business Patterns File

Business Name Standardization and Sun Master Index

Business Name Processing Fields

Configuring a Standardization Structure for Business Names

Configuring Phonetic Encoding for Business Names

Custom FSM–Based Data Types and Variants

Business Patterns File

The business patterns file (bizpatterns.dat) defines multiple formats expected from the business name input fields along with the standardized output of each format. The patterns and output appear in two-row pairs in this file, as shown below.

4 PNT AST SEP-GLC ORT
PNT AST DEL ORT

The first line describes the input pattern and the second describes the output pattern using tokens to denote each component. The supported tokens are described in Business Name Tokens. A number at the beginning of the first line indicates the number of components in the given business name format. You can modify this file using the following syntax.

length input-pattern
output-pattern

The following table lists and describes the components in the above syntax.

Business Patterns File Components

Component
Description
length
The number of business name components in the input field.
input-pattern
Tokens that represent a possible input pattern from the unparsed business name fields. Each token represents one component. For more information about address tokens, see Business Name Tokens.
output-pattern
Tokens that represent the output pattern for the specified input pattern. Each token represents one component. For more information about business name tokens, see Business Name Tokens.

Below is an excerpt from the business patterns file.

4 PNT AST SEP-GLC ORT
PNT AST DEL ORT

4 NFG AJT SEP-GLC ORT
PNT PNT DEL ORT

4 NF AJT SEP-GLC ORT
PNT PNT DEL ORT

4 CST IDT NF ORT
PNT PNT PNT ORT

4 PNT AJT SEP-GLC ORT
PNT PNT DEL ORT
Business Name Tokens

The business patterns file uses tokens to denote different components in a business name, such as the primary name, alias type key, URL, and so on. The file uses one set of tokens for input fields and another set for output fields. The tokens indicate the type key files to use to determine the appropriate values for each output field. You can use only the predefined tokens to represent business name components; the standardization engine does not recognize custom tokens.

Table 13 lists and describes each input token; Table 14 lists and describes each output token.

Business Name Input Pattern Tokens

Pattern Identifier
Description
CTT
A connector token
PNT
A primary name of a business
PN-PN
A hyphenated primary name of a business
BCT
A common business term
URL
The URL of the business’ web site
ALT
A business alias type key (usually an acronym)
CNT
A country name
NAT
A nationality
CST
A city or state type key
IDT
An industry type key
IDT-AJT
Both an industry and an adjective type key
AJT
An adjective type key
AST
An association type key
ORT
An organization type key
SEP
A separator key
NFG
Generic term, not recognized as a specific business name component, with an internal hyphen
NF
Generic term, not recognized as a specific business name component
NFC
A single character, not recognized as a specific business name component
SEP-GLC
A joining comma (a glue type separator)
SEP-GLD
A joining hyphen (a glue type separator)
AND
The text “and”
GLU
A glue type key, such as a forward slash, connecting two parts of a business name component
PN-NF
A business primary name followed by a hyphen and a generic term that is not recognized as a specific business name component
NF-PN
A generic term that is not recognized as a specific business name component, followed by a hyphen and a recognized business primary name
NF-NF
Two generic terms, not recognized as specific business name components and separated by a hyphen

Table 14 lists and describes each output token.

Business Name Output Pattern Tokens

Pattern Identifier
Description
PNT
The primary name of the business
URL
The URL of the business
ALT
The alias type key of the business (usually an acronym)
IDT
The industry type key of the business
AST
The association type key of the business
ORT
The organization type key of the business
NF
A generic term not recognized as a business name component