FVC-APP Help
for the CIHM: Please click [To Be Done] on a field
name to go to a description of the data element for that field.
Details: FVC Application Client Identification Hashing Module (in progress)
Terms listed below in ALL CAPS correspond to the field names on the
CIHM screen of the FVC-APP.
CDI-10 – CIHM
Product Analysis
August 31,
1999
AUTHOR:Corey
Ellsworth
REVISION:
1
as there have been some revisions to the CIHM since 8/99. Updates
will be posted when available.]
Table
of Contents:
1 Introduction
2
Product Resolution
2.1CIHM
Requirements
2.1.AHashing
Function
2.1.BNaming
Convention
2.1.CAliases
2.1.DMissing
ID’s
2.2CIHM
Solutions
2.2.A Hashing
Function
2.2.BNaming
Convention
2.2.CAliases
2.2.DMissing
ID’s
3 Conclusion
1
Introduction – The Client Identification Hashing Module (CIHM) The Garrett Pilot Data Mart project will consist
of two OLTP databases and one OLAP database.True
names will not be stored in any of the databases.To
ensure data continuity and individual anonymity between the three databases
a standard data entry protocol will need to be implemented.The
CIHM incorporates this protocol into a set of functions that will be used
throughout the Pilot Data Mart project.This
document will detail the CIHM.
2 Product
Resolution 2.1CIHM
Requirements The following sections will detail the requirements
of the CIHM as defined in the document “Performance Requirements Specification
– Section 4”.
2.1.AHashing
Function
Since the databases will not store true names, a
way of maintaining individual anonymity while keeping the data useful will
be needed.To achieve this, the SHA-1
hashing algorithm, developed by the National Institute of Standards and
Technology (NIST), will be implemented in the CIHM.The
SHA-1 algorithm will hash an individual’s name, date of birth and gender
into a 160-bit binary number.This
number will ensure individual anonymity while still allowing valuable analytical
data to be retrieved from the OLAP database.
2.1.BNaming
Convention
For the CIHM to identify two separate entries as
the same person a standard naming convention will have to be adopted and
enforced.The naming convention
that has been chosen is First Name, Middle Name, Last Name, Jr|Sr|2nd/etc,
Date of Birth and Gender.These data
items will be concatenated into one string.For
example, Terrance Corey Ellsworth 4/9/75 M.This
string will be hashed with the SHA-1 algorithm.To
enforce data integrity, the data being fed to the CIHM will need to be
run through stringent error checking code.This
code will probably not be a part of the CIHM itself, but must be mentioned
because of the importance of enforcing data entry protocols.
2.1.CAliases
A person’s name may change over time.On
the same note, one department entering into the data mart might not have
access to the person’s first, middle, and last name, while another does.To
account for this the CIHM will need to gather and store (in the appropriate
database) a list of alias names for each individual.These
aliases will allow for more accurate reporting and more reliable statistical
output.
2.1.DMissing
ID’s
During the transfer of legacy data to their respective
database the CIHM will be required to handle incomplete identification
input.The CIHM will accept this
input and handle it in a consistent fashion.
2.2CIHM
Solutions The following sections will describe the proposed
solutions to each of the requirements of the CIHM.Features
and functionality discussed in these sections will be contained in the
prototype unless specifically stated otherwise.
2.2.AHashing
Function
The prototype CIHM will be implemented as a Visual
Basic class called “CIHM”.The CIHM
class will also contain an instance of two other classes, the “ALIAS” class
and the “HASH” class.This section
will detail the interface provided to the CIHM class by the “HASH” class.
The HASH class will provide the CIHM with one method
and two properties.The method is
called Hash() and requires one parameter, strMessage.The
strMessage parameter is a string that consists of the information to be
hashed.The Hash() method makes calls
to various private functions that hash the message into the 160-bit message
digest (hash).
Once the Hash() method has been executed the two
properties of the HASH class will become available.The
first and most commonly used property is the HexDigest property.This
property returns the hashed message digest in hexadecimal (base 16).In
this format, the hashed message can be inserted into the SQL Server binary
data field.The second property is
the BinaryDigest property.This property
returns the message digest in binary.The
BinaryDigest property is provided for easier debugging of CIHM applications.Both
the HexDigest and BinaryDigest properties are read-only.
2.2.BNaming
Convention
In order to ensure that correct data is being fed
to the CIHM a naming convention will be enforced on the data input screens
of each application employing the CIHM.The
data input screens will ensure that at least the first and last name of
the individual is entered.They will
also convert all letters in the name to upper case.This
will eliminate the possibility that one might type John McAfee and another
might type John Mcafee.
In addition to checking name input, the input forms
will also validate the date of birth as being valid date input.All
dates will be converted to a standard MM/DD/YYYY to eliminate inconsistencies
in date input.
Finally, the gender field will be a dropdown box.
This will ensure that gender input will be standard throughout each database.
Implementing these input-cleansing routines will
better enable the anonymous tracking of cases throughout each database.
2.2.CAliases
The ALIAS class contained in the CIHM will provide
2 methods.The first method is the
Initialize() method.This method
will initialize the class for use with a specific database, based on the
parameters passed to it.The parameters
passed to the Initialize() method are as follows:
adoActiveConn
: ADO Connection Type : provides the class with a reference to
an
active ADO connection object
(database
interface)
strTName :
String Type : Contains the name of the table to manipulate
strFNID : String
Type : Contains the DB field name of the field that stores ID’s
strFNHash :
String Type : Contains the DB field name of the field that stores
hashed
identification information
The second method will be the ID() method.The
ID() method will return the ID number of the person who is being hashed.This
method will take the following parameters:
strName : String
Type : Contains the true name of the individual
datDOB : Date
Type : Contains the birth date of the individual
strGender :
String Type : Contains the gender of the individual
The ID() method will first hash the individual’s
identification and check to see if the individual already exists in the
database.If the individual exists,
ID() will simply return the ID of that individual to the calling procedure.If
the individual does not exist, ID() will invoke a private function that
will prompt for aliases of the individual.Each
alias will be hashed and checked against the database for existing instances.If
none exist then the hashed alias will be inserted into the database.If
a hashed alias does exist in the database then the existing ID will be
used for all entries for this individual.If
the individual is female, the maiden name will also be prompted for.The
same procedure will be applied to the hashed maiden name.
2.2.DMissing
ID’s
During the transfer of legacy data to their respective
database, the CIHM will be required to handle incomplete identification
input.To account for this the CIHM
will treat all unidentifiable legacy records as a new input and assign
an ID accordingly.
The hash field for unidentifiable records will contain
the hashed value for the current date/time in MM/DD/YYYY HH:MM:SS format.This
will ensure that every hash value will be unique.An
identifying database field will distinguish all legacy records from current
records.
Legacy records with partial or full identification
will be hashed.For tracking and
reporting purposes, these records will be marked as legacy data as well.
3.0Conclusion
This document was generated from the working CIHM
prototype.All features covered in
this document have been implemented in the CIHM prototype.Revisions
to this document will be reflected in the CIHM prototype as well.
Return to the MART Home Page