VR technology not (yet) ready for prime time

MUSC oncologist Larry Afrin, M.D., says he hears it often: “What about voice recognition? Can’t I use that stuff to dictate patient notes directly into my computer?”

Afrin, a self-described technology enthusiast, has given the top three voice recognition software packages the physician’s road test.

His answer to the frequently asked question? “Yes, you can. But you’re not going to like it.” And to medical transcriptionists, his message is clear: “Relax. You’re still in demand.”

Voice recognition (VR) technology in concept is simple. Digitize (turn into numbers) the audio wave that is speech, teach a computer to associate different patterns in the stream of numbers with different words, and display those words on a screen.

So the concept is simple and the market is huge. But while strides have been made, VR technology still has a way to go to achieve a level of sophistication that will meet the average physician’s needs, according to Afrin.

“Take a look at what we’re dealing with here,” said Afrin. “The average person out of high school has a vocabulary of about 30,000 words, 10,000 of which are used on a daily basis. The physician out of medical school has a vocabulary nearly 10 times that—200,000 words or more. Granted, most of those words are the products of prefixes, suffixes and root words in various combinations. Still, each word is a different sound pattern, and searching a larger pattern database takes more computing horsepower.”

Afrin further explained that to truly achieve accurate voice recognition by a computer, programmers have to first understand how people hear— and, more importantly, understand—words. The algorithm (steps in the process) becomes extremely complicated when the computer is asked to listen to the way we naturally talk, separate the individual words out of that continuous stream of sound, deal with accents and changing voice inflections, search a 200,000 word database, and check the context to sort out homophones (words that sound the same but have different meanings and/or spellings).

“There’s ‘to,’ ‘too,’ and ‘two,’” Afrin said. “You and I know which one it is nearly every time, but for a computer to do so, the computer programmer has to know how you and I do it. "

But even language scientists don’t fully understand how we understand speech, so it’s not surprising that the computers we humans program don’t fully understand it either.”

Editor's note: As MUSC rolls out its Electronic Medical Records systems, the prospect of using computers to turn spoken words into electronically stored data to be retrieved in printed form is an attractive one for physicians. For the next few issues The Catalyst will examine this technology, how it's being used and what it promises for the future.

Catalyst Menu | Community Happenings | Grantland | Research Grants | Research Studies | Seminars and Events | Speakers Bureau | Applause | Archives | Charleston Links | Medical Links | MUSC |