EMDAT - medical voice recognition

EMDAT - medical voice recognition The Engine That Powers Online Web-Based Medical Transcription
For Healthcare Facilities For Transcription Providers About EMDAT Contact Us News and Events
The Expressway™ Advantage Services FAQ
Competitive Advantage Case Studies Medical Voice RecognitionReduce Transcription Costs

MEDICAL VOICE RECOGNITION

Introduction

Medical Transcription Service Organizations (MTSOs) provide a valuable service to the healthcare industry that increases physician efficiency and decreases medical documentation costs for hospitals, clinics, and private practices because dictation remains the fastest, easiest, most descriptive medical documentation method currently available. In spite of this, many healthcare providers and HIM managers view transcription as a cost or expense, not as a value added service. As a result, hospitals, clinics, and private practices are continually trying to get the lowest price possible for transcription.

At the same time, demand for transcription is increasing. In many organizations, there are not enough qualified transcriptionists to handle all the transcription work available. Additionally, the implementation of electronic medical records (EMR) in hospitals, clinics, and private practices has the potential to further increase transcription volume. As healthcare providers adopt EMR systems, physicians will move from handwriting notes to populating EMR systems. This may create demand for transcriptionists who enter data directly into EMR systems from dictation. As a result, MTSOs must compete for highly qualified MTs and are unable to reduce the most significant part of transcription costs – labor.

The result of this situation is smaller margins and limited profits. Productivity of MTs is strong, but many of the tools used in the past to increase productivity (workflow, keyword expanders, etc) are now in common use. As a result, productivity gains have become stagnant.

What the transcription industry needs is a breakthrough that will dramatically boost the productivity of medical transcriptionists (MTs) while enhancing the value of transcription by preparing medical reports that are ready to import into an EMR system. This page is about a technology partnership between Emdat and M*Modal that can provide this breakthrough.

Speech Recognition Technologies

Application of speech recognition technology is one option often discussed as a potential solution to many healthcare documentation problems. However, a common misconception is that the use of speech recognition technologies will lead to the elimination of transcription.

The availability of front-end speech recognition (which uses a computer to convert speech to text in real time) has not eliminated or reduced the need for transcription in hospitals, clinics, and private practices. In fact, the number of lines of transcription produced continues to rise each year.

Back-end speech technology converts a complete recording into text, using a server, rather than the speaker’s computer. The result is a text document that represents the spoken narrative of the recording. This technology is available to create efficiencies in transcription by providing a draft of the transcription for an MT to edit.

M*Modal’s unique approach to speech technology starts with an effort to understand “overheard” conversation. The goal is to allow the speaker to speak in a natural manner – as they would when talking with another person – not as they would speak to a computer. The unique combination of speech recognition and natural language processing creates the Always Understanding™ technology platform for Transforming Words into Action TM.

M*Modal offers on-demand conversational documentation services that enable healthcare providers to capture specific clinical information directly from dictation narratives to generate complete and timely medical records. Our proprietary speech understanding technology platform, AnyModal™ CDS, is a vital tool that empowers physicians to capture clinical facts and orders (as opposed to just text) from dictation, without requiring any change to their current dictation routine. Our system will recognize and understand natural speech patterns, automatically creating structured and encoded HL7 CDA documents.

Role of EMR

EMR systems store patient information and clinical reports in a format that you can access, review, analyze, and report on. Healthcare providers can share data about an individual in an EMR system so that the primary care physician can review information created by a specialist and vice-versa. The analysis of aggregate data from EMR systems (non-identifiable clinical data) can follow healthcare trends to be used in data mining. This sharing of information will be a powerful tool to influence outcomes.

However, physician acceptance of EMR systems has grown slower than many expected. This, M*Modal believes, is primarily because most EMR vendors propose that physicians directly enter the data into an EMR system themselves. This process is time consuming and takes the physician much longer than dictating a report – whether the data is entered by typing or via front-end speech recognition. It also removes much of the narrative of dictation, often forcing the physician to pick choices from predefined lists.

On the other hand, transcription does not capture information in structured and encoded forms. Transcription output is typically in the form of a text document (often in Microsoft Word) or printed format (arriving at the physician’s office in the form of a fax). These formats cannot populate an EMR system* and the cost of typing the narrative and capturing the coded data is too high for office staff to enter the data after transcription has occurred.

What is needed is a technology that assists in converting dictation into a structured and encoded format (while preserving the richness of narrative language) that can be verified and edited in the transcription process and delivered in an electronic format ready for import into an EMR system.

That is what M*Modal’s speech understanding technology is about, going beyond basic speech recognition technology to converting dictation recordings into structured documents that an MT can verify and edit for correctness and then producing output in multiple formats based on physician requirements – including both text-based and standards-based electronic formats.

Speech Understanding in Transcription Workflow

The picture below depicts a typical transcription workflow.



The healthcare provider dictates into the phone, a dictation recorder, or a computer. The recording and ADT feed is sent to the medical transcription service via Emdat’s ADT/scheduler interface where an InCommand application assigns work according to source, skills, and the availability of MTs. ADT information, if available, is verified and a transcription of the dictation is typed. It is important to note that excellent transcription is not a matter of typing every word and audible sound. Sometimes physicians include instructions to the MT when dictating. This text is not to be included in the report, but may give instructions about using a table, list, or normal. These templates will save time and increase productivity.

Once the document is typed, it is sent back toInCommand, which makes decisions about assignment to QA or delivery of the document. When the document is deemed ready for delivery, it is sent via the internet to Emdat’s InQuiry application.

M*Modal’s AnyModal CDS provides back-end speech technology that can be incorporated into the transcription workflow without physician participation or training. The graphic below represents the new workflow that includes speech technology.



The workflow is altered so that the Emdat system, after receiving the audio file, pushes it to M*Modal for generation of the draft clinical document. When the draft is ready, the Emdat system downloads the draft and assigns it to an MT for editing. Using Emdat’s InScribe and AnyModal Edit, the MT reviews and makes corrections to the draft. The edited clinical document is saved and sent back to M*Modal. Tools are then available from M*Modal for outputting the final clinical document in a variety of formats.

An important aspect of this process is that as the MT makes corrections to the draft, AnyModal CDS tracks those changes and continuously learns from the edits. If a correction is made several times, AnyModal CDS learns that the corrected form is desired and changes future drafts to match the recurring edits. As time goes on, AnyModal CDS becomes more accurate and produces more precise draft documents.

In this process, the dictating physician continues to use Emdat’s InTouch or InSnyc and does not “train” the system by reading a pre-defined set of text. M*Modal’s technology is designed to learn by watching. When a new physician is added to the workflow, the system adds the physician to AnyModal CDS in “training mode.” Subsequently, voice files are sent to M*Modal, but the clinical document is typed from scratch in the AnyModal Edit component. When enough typing has occurred in training mode, AnyModal CDS builds a voice model that allows it to beginning recognizing dictation from that physician. Physicians are added at anytime and do not need to be aware that speech understanding technology is being used in the transcription process.

Benefiting from Speech Understanding

Integrating speech understanding into transcription workflow offers several benefits. The foremost for MTSOs is the dramatic increase in productivity for MTs when editing rather than typing. Productivity also increases with AnyModal CDS because the MT is no longer responsible for formatting reports. Productivity continues to increase as AnyModal CDS learns from corrections and fewer edits are required. These productivity increases are the primary driver behind increased transcription margins.

MT Productivity

As mentioned, AnyModal CDS produces a draft version of the medical report for an MT to validate and correct which will reside in InScribe. For many medical transcriptionists, the factor that most controls their productivity is typing speed. Physicians can speak 200 – 300 words per minute, but most typists can only average about 65 – 90 words per minute (includes correcting typos). This means that on average a transcriptionist will spend about 4.5 minutes producing a transcript for every 1 minute of dictation.

The goal of M*Modal’s AnyModal CDS is to change that ratio from 4.5:1 to an average of 2.2:1. Meaning, an MLS should spend about 2.2 minutes (on average) editing a document for every one minute of dictation. Another way to measure productivity is using lines per hour. On average, MTs produce about 110-130 lines of transcription per hour. With AnyModal CDS, this rate increases to an average of 220-250 lines per hour for a 100% increase in productivity!

It is important to recognize that this productivity increase takes time. The MT must learn a new skill, editing, in order to become efficient. The chart below depicts the typical improvement in performance over an initial 11-week period.

Another reason that using AnyModal CDS improves the productivity of MTs is that the system separates capturing the right meaning from getting the report into the right format. The purpose of AnyModal Edit is to validate and correct the contents of a medical report. Tools provided by AnyModal CDS add the formatting from a template after the MT validates and corrects the report. This virtually eliminates the edits made by MTs to format the document according to account specifications.

Continuous Learning

As dictation flows from AnyModal CDS for generation, to the MT for editing and back to AnyModal CDS for formatting, the system learns. AnyModal CDS is trained through corrections to improve the recognition and understanding of an individual physician’s dictation. This continuous learning process results in constant improvements in understanding what the speaker actually means.

In addition, AnyModal CDS learns across speakers and specialties. The system periodically reviews large volumes of transcription looking for edit patterns across physicians and specialties. New medical terminology is recognized and added to the language models. As AnyModal CDS learns, it will more accurately recognize and understand worktypes, improving draft quality.

Improve Transcription Margins

The future of transcription will be a combination of editing and typing because not all typing from dictation will disappear with the implementation of speech technologies. Some speakers are not a good fit for speech technology because they correct themselves and backtrack too often, or they habitually dictate with excessive background noise prevalent (for example, speaking on a cell phone in a convertible with the top down). Those dictators will continue to use traditional transcription.

It is M*Modal’s goal to provide speech understanding services that are able to work for 80% percent of an MTSOs’ transcription volume. It is important to recognize that the overall operational savings to your organization is dependent upon the volume of transcription processed by AnyModal CDS and edited rather than transcribed. The chart below shows the approximate operational savings an MTSO will experience based on the volume of their transcription processed by AnyModal CDS.

Increase Document Quality and Enforce Consistency

One additional benefit of AnyModal CDS is that it helps MTSOs with report consistency and the application of account specifications. AnyModal CDS allows the construction of account specifications for each MTSO customer. These account specifications define and enforce consistency, they contain rules for the medical reports such as allowed or required report sections, date and time formats, numeric formats, abbreviation rules, and spacing rules. Rules can be applied by section. These account specifications are defined for AnyModal CDS, which then applies the rules to all transcription for the customer. Exceptions are permitted and can be defined by department, physician, or worktype.

Many of our customers discover, after deploying AnyModal CDS, that their MTs have been inconsistent in applying account specification or that they are not fully defined for certain customers. Using AnyModal CDS to create the draft results in consistent documents that follow clearly defined account specification rules.

AnyModal CDS also provides consistency with respect to the formatting of medical reports. As mentioned above, AnyModal uses templates to create reports. It allows multiple templates to exist and our customers embed the tools into their workflow and decide when to generate the appropriate document. An MTSO customer can have templates based on specialty, worktype, or physician. The workflow can determine the appropriate template to apply to the approved report.

The M*Modal Difference

M*Modal is stringently focused on the healthcare industry. It is our goal to provide the industry with the most comprehensive yet adaptable solution for creating highly accurate, structured, encoded, and shareable medical reports. To meet that goal, M*Modal offers a unique combination of back-end speech technology, client-side editing tools and integration support.

Service-based model

AnyModal CDS is an “on-demand” service. This service is provided on a per minute basis via the Internet. This means that customers only pay for speech understanding when they need it. There is no fee for dictation that the workflow system decides should be transcribed. There is also no fee for dictation that goes through speech understanding but is not sent to an MT for editing because it does not meet the agreed upon score for editing effort.

Providing speech understanding as a service also means that our customers do not need to install additional hardware (servers) in order to take full advantage of speech understanding. AnyModal CDS is hosted in a secure data center and M*Modal manages the hardware, software, hardware maintenance, software updates, back-ups, and disaster recovery. We update our software frequently to ensure all our customers are using the best speech understanding technology available.

Integrated editor

In order to get the full benefit of M*Modal’s speech understanding technology, AnyModal Edit will be incorporated into Emdat’s InScribe application.This editing component cannot stand alone, but is designed to be embedded into InScribe to provide seamless access to editing and typing from your Emdat system. There are several reasons why this editing component adds significant value to an MTSO.

First, training and continuous learning occur when the AnyModal Edit component is used. As mentioned above, training occurs when the document is sent to AnyModal CDS, but transcribed in the editor. AnyModal CDS uses the transcribed report to learn how the MT creates a report from the physician’s dictation. Both automated training and learning function best with the use of AnyModal Edit.

Second, AnyModal Edit captures the keystrokes used for editing a draft document. This is very useful because editing is a learned skill that is different from transcribing. An MT must learn how to edit efficiently in order to realize the full advantages of speech understanding. Capture of keystrokes by the Edit control allows for analysis of keystrokes and additional training based upon usage patterns to optimize MT performance.

Third, using AnyModal Edit allows M*Modal to provide audio cues that link the recognized text to audio. This provides the ability to link the cursor to the text while the audio is playing – moving the cursor from word to word as it is played back; thereby positioning the cursor for editing when the transcriptionist sees a recognition error and needs to correct it. Over time, MTs will learn to play the audio while editing – significantly enhancing their productivity.

Last, AnyModal Edit provides robust typing capabilities in addition to editing features. This allows use of AnyModal Edit for both editing and transcribing and means that Medical Transcriptionists can learn and utilize a single system for both editing and transcribing. AnyModal Edit has many features designed for transcription and our customers report that their MTs experience an increase in transcription productivity using AnyModal Edit as opposed to Microsoft Word.

Easy Integration

Emdat MTSO’s use our suite of applications for conducting business. That platform consists of software and other technology for capturing voice, assigning work to MTs, downloading and uploading files, typing, determining which transcripts go to QA, and report distribution, etc.

AnyModal CDS will fit seamlessly into Emdat’stranscription workflow. Our Web services API provides access to our on-demand services via the Internet. Use of Web services means that the API is language and platform seamlessly integrated into Emdat’s system.

At the beginning of integration, customers send developers to a two to three day integration workshop. The goal of the workshop is to leave with a prototype of their workflow that is able to submit jobs for speech understanding and retrieve draft documents for editing. M*Modal provides on-going developer support throughout the workflow integration process.

In addition, if your platform consists of custom software using a basic editing component – AnyModal Edit simply replaces the editing component. You can keep the custom code for special functions and update it to use AnyModal Edit. As with the Web services, developer assistance is available from M*Modal to help with the integration process. At the end of the integration, the software application provided to the MTs functions very much like the original, but with new capabilities for editing draft documents.

Training Programs

Customers about to implement M*Modal’s speech understanding technology should spend some time training their MTs to edit rather than transcribe. Transcription training programs provide the right background for editing – medical vocabulary, following physician directions, etc., but do not yet include information on editing vs. transcribing. The transition from the current typing platform to AnyModal Edit provides an opportunity to train editing skills while initially training MTs to use the new typing/editing environment.

To assist our customers with training MTs to edit, M*Modal has developed a “Train the Trainer” program in which our trainers teach your MT trainers how to edit efficiently and how to train others to do the same. We provide training materials for your internal training program and a shortcut guide for MTs. We also provide post-training support as your trainers gain experience with AnyModal CDS and have questions.

As mentioned above, M*Modal can provide reports on MT editing habits and provide input on additional training for individuals to help optimize their performance.

Meaningful Clinical Documents

Perhaps the most important difference regarding AnyModal CDS is its Meaningful Clinical Documents™. Meaningful clinical documents refer to the ability of AnyModal CDS to produce a document that is much more than just text. Our documents are an extension of the Health Level 7 (HL7) Clinical Document Architecture (CDA). HL7 CDA defines a structured format for clinical documents.

What is important to understand here is the “structured format.” Most transcriptions are text documents with formatting, usually stored as RTF or MS Word documents. While the text documents may have section titles and data in them, the information is not structured in a way that a computer can easily identify it. For example, the text may say “blood pressure was 130 over 90” but a computer would not be able to identify the blood pressure and the numeric values in order to act upon them or send them to an application that understood the data.

HL7 CDA defines how to “structure” information so that it can be clearly identified and shared. It defines how a file should store important measurements and medical facts so that they can share the information and it can be used by other software programs.

This is particularly relevant as EMR systems become more popular. EMR systems store structured data that comes from a patient visit. An EMR system can import a meaningful clinical document in HL7 CDA format to create the patient visit record. The physician could accurately record measurements such as temperature and blood pressure through dictation.

By using HL7 CDS as the foundation for our clinical documents, M*Modal is prepared to help MTSOs capture transcription business that must populate EMR systems.

Conclusion

The medical transcription business is both rewarding and challenging. Because hospitals, clinics, and private practices see transcription as a pure expense and because there are not enough qualified transcriptionists to complete the current workload, MTSOs are facing smaller margins and limited profits. What is needed is a breakthrough that will dramatically boost the productivity of medical transcriptionists while enhancing the value of transcription by preparing structured clinical documents.

M*Modal’s AnyModal CDS service will allow MTSOs to improve production time, reduce per line labor costs, and improve the quality and consistency of medical reports. Using speech understanding technology to capture dictation and convert it into a structured and encoded format allows the MT to focus on the structure and content of the draft clinical document, rather than transcribing the entire report. Speech recognition will never eliminate the need for qualified MTs, but will help them evolve into “Medical Language Specialists.”

Incorporating AnyModal CDS into the existing workflow means that there is no “training” of the physician. In addition, the groundwork for training the MTs to edit, rather than transcribe, is already in place. MTs have the necessary background for editing (medical vocabulary, following physician directions, etc). Since AnyModal Edit can be placed into any existing platform, the transition to AnyModal CDS is seamless.

Additionally, AnyModal CDS can provide clinical documents in a variety of formats – even for the same dictation. This allows for the generation of patient visit reports, discharge summaries, referring physician letters and other documentation from a single dictation.

AnyModal CDS helps MTSOs respond to the growing adoption of EMR systems because it can deliver the final medical report in an electronic format that can easily be imported into an EMR system; thus increasing the physician’s ability to care for patients without increasing his or her workload or schedule.

For Healthcare Facilities | For Transcription Providers | About EMDAT | Contact Us | News and Events | Sitemap

Copyright 2008 EMDAT - medical voice recognition