Definitions[]
Speaker recognition (also known as voice recognition)
“ | is a biometric modality that uses an individual's voice for verification and/or identification. For recognition purposes, speaker recognition uses models developed from an individual's speech, a feature influenced by both the physical structure of an individual’s vocal tract and the behavioral characteristics of the individual. | ” |
“ | takes a voice sample of a user via the mobile device's microphone to identify and authenticate a user.[1] | ” |
Overview[]
Speaker recognition can be used to verify a person's claimed identity or to identify a particular person. It is often where voice is the only available biometric identifier, such as telephone and call centers.
“ | A popular choice for remote authentication due to the availability of devices for collecting speech samples (e.g., telephone network and computer microphones) and its ease of integration, speaker recognition is different from some other biometric methods in that speech samples are captured dynamically or over a period of time, such as a few seconds. Analysis occurs on a model in which changes over time are monitored, which is similar to other behavioral biometrics such as dynamic signature, gait, and keystroke recognition.[2] | ” |
Speech recognition recognizes the words being said, and is not a biometric technology.
How it works[]
During enrollment, a speaker recognition system captures samples of a person’s speech by having him or her speak some predetermined information into a microphone or telephone a number of times. This information, known as a passphrase, can be a piece of information such as a name, birth month, birth city, or favorite color or a sequence of numbers. This is known as a "text-dependent system." "Text-independent systems" are also available that recognize a speaker without using a predefined phrase. Text-dependent systems perform more efficiently. Text-independent systems are more flexible and are more effective in situations where the individual may be unaware of the collection or unwilling to cooperate, or where spoofing is a concern.
The phrase or phrases are converted from analog to digital format, and the distinctive vocal characteristics, such as pitch, cadence, and tone, are extracted, and a speaker model is established. A template is then generated and stored for future comparisons. Voice templates are much larger than templates generated from other biometric technologies, typically 10,000 to 20,000 bytes.
"The required sensors currently exist within mobile phones, but this may not hold true for all mobile devices such as wearables and certain tablets."[3]
References[]
- ↑ NISTIR 8080, at 17.
- ↑ FBI, Biometric Center of Excellence, "Voice Recognition" (full-text).
- ↑ NISTIR 8080, at 17.
Sources[]
- Information Security: Challenges in Using Biometrics, at 9.
- Privacy and Biometrics: Building a Conceptual Foundation, at 18-19.