Medical Transcription Service Organizations (MTSOs) provide a
valuable service to the healthcare industry that increases
physician efficiency and decreases medical documentation costs
for hospitals, clinics, and private practices because dictation
remains the fastest, easiest, most descriptive medical
documentation method currently available. In spite of this, many
healthcare providers and HIM managers view transcription as a
cost or expense, not as a value added service. As a result,
hospitals, clinics, and private practices are continually trying
to get the lowest price possible for transcription.
At the same time, demand for transcription is increasing. In
many organizations, there are not enough qualified
transcriptionists to handle all the transcription work
available. Additionally, the implementation of electronic
medical records (EMR) in hospitals, clinics, and private
practices has the potential to further increase transcription
volume. As healthcare providers adopt EMR systems, physicians
will move from handwriting notes to populating EMR systems. This
may create demand for transcriptionists who enter data directly
into EMR systems from dictation. As a result, MTSOs must compete
for highly qualified MTs and are unable to reduce the most
significant part of transcription costs – labor.
The result of this situation is smaller margins and limited
profits. Productivity of MTs is strong, but many of the tools
used in the past to increase productivity (workflow, keyword
expanders, etc) are now in common use. As a result, productivity
gains have become stagnant.
What the transcription industry needs is a breakthrough that
will dramatically boost the productivity of medical
transcriptionists (MTs) while enhancing the value of
transcription by preparing medical reports that are ready to
import into an EMR system. This page is about a technology
partnership between Emdat and M*Modal that can provide this
breakthrough.
Speech Recognition Technologies
Application of speech recognition technology is one option often
discussed as a potential solution to many healthcare
documentation problems. However, a common misconception is that
the use of speech recognition technologies will lead to the
elimination of transcription.
The availability of front-end speech recognition (which uses a
computer to convert speech to text in real time) has not
eliminated or reduced the need for transcription in hospitals,
clinics, and private practices. In fact, the number of lines of
transcription produced continues to rise each year.
Back-end speech technology converts a complete recording into
text, using a server, rather than the speaker’s computer. The
result is a text document that represents the spoken narrative
of the recording. This technology is available to create
efficiencies in transcription by providing a draft of the
transcription for an MT to edit.
M*Modal’s unique approach to speech technology starts with an
effort to understand “overheard” conversation. The goal is to
allow the speaker to speak in a natural manner – as they would
when talking with another person – not as they would speak to a
computer. The unique combination of speech recognition and
natural language processing creates the Always Understanding™
technology platform for Transforming Words into Action TM.
M*Modal offers on-demand conversational documentation services
that enable healthcare providers to capture specific clinical
information directly from dictation narratives to generate
complete and timely medical records. Our proprietary speech
understanding technology platform, AnyModal™ CDS, is a vital
tool that empowers physicians to capture clinical facts and
orders (as opposed to just text) from dictation, without
requiring any change to their current dictation routine. Our
system will recognize and understand natural speech patterns,
automatically creating structured and encoded HL7 CDA documents.
Role of EMR
EMR systems store patient information and clinical reports in a
format that you can access, review, analyze, and report on.
Healthcare providers can share data about an individual in an EMR system so that the primary care physician can review
information created by a specialist and vice-versa. The analysis
of aggregate data from EMR systems (non-identifiable clinical
data) can follow healthcare trends to be used in data mining.
This sharing of information will be a powerful tool to influence outcomes.
However, physician acceptance of EMR systems has grown slower
than many expected. This, M*Modal believes, is primarily because
most EMR vendors propose that physicians directly enter the data
into an EMR system themselves. This process is time consuming
and takes the physician much longer than dictating a report –
whether the data is entered by typing or via front-end speech
recognition. It also removes much of the narrative of dictation,
often forcing the physician to pick choices from predefined
lists.
On the other hand, transcription does not capture information in
structured and encoded forms. Transcription output is typically
in the form of a text document (often in Microsoft Word) or
printed format (arriving at the physician’s office in the form
of a fax). These formats cannot populate an EMR system* and the
cost of typing the narrative and capturing the coded data is too
high for office staff to enter the data after transcription has
occurred.
What is needed is a technology that assists in converting
dictation into a structured and encoded format (while preserving
the richness of narrative language) that can be verified and
edited in the transcription process and delivered in an
electronic format ready for import into an EMR system.
That is what M*Modal’s speech understanding technology is about,
going beyond basic speech recognition technology to converting
dictation recordings into structured documents that an MT can
verify and edit for correctness and then producing output in
multiple formats based on physician requirements – including
both text-based and standards-based electronic formats.
Speech
Understanding in Transcription Workflow
The picture below depicts a typical transcription workflow.
The healthcare provider dictates into the phone, a dictation
recorder, or a computer. The recording and ADT feed is sent to
the medical transcription service via Emdat’s ADT/scheduler
interface where an InCommand application assigns work according
to source, skills, and the availability of MTs. ADT information,
if available, is verified and a transcription of the dictation
is typed. It is important to note that excellent transcription
is not a matter of typing every word and audible sound.
Sometimes physicians include instructions to the MT when
dictating. This text is not to be included in the report, but
may give instructions about using a table, list, or normal.
These templates will save time and increase productivity.
Once the document is typed, it is sent back toInCommand, which
makes decisions about assignment to QA or delivery of the
document. When the document is deemed ready for delivery, it is
sent via the internet to Emdat’s InQuiry application.
M*Modal’s AnyModal CDS provides back-end speech technology that
can be incorporated into the transcription workflow without
physician participation or training. The graphic below
represents the new workflow that includes speech technology.
The workflow is altered so that the Emdat system, after
receiving the audio file, pushes it to M*Modal for generation of
the draft clinical document. When the draft is ready, the Emdat
system downloads the draft and assigns it to an MT for editing.
Using Emdat’s InScribe and AnyModal Edit, the MT reviews and
makes corrections to the draft. The edited clinical document is
saved and sent back to M*Modal. Tools are then available from
M*Modal for outputting the final clinical document in a variety
of formats.
An important aspect of this process is that as the MT makes
corrections to the draft, AnyModal CDS tracks those changes and
continuously learns from the edits. If a correction is made
several times, AnyModal CDS learns that the corrected form is
desired and changes future drafts to match the recurring edits.
As time goes on, AnyModal CDS becomes more accurate and produces
more precise draft documents.
In this process, the dictating physician continues to use
Emdat’s InTouch or InSnyc and does not “train” the system by
reading a pre-defined set of text. M*Modal’s technology is
designed to learn by watching. When a new physician is added to
the workflow, the system adds the physician to AnyModal CDS in
“training mode.” Subsequently, voice files are sent to M*Modal,
but the clinical document is typed from scratch in the AnyModal
Edit component. When enough typing has occurred in training
mode, AnyModal CDS builds a voice model that allows it to
beginning recognizing dictation from that physician. Physicians
are added at anytime and do not need to be aware that speech
understanding technology is being used in the transcription
process.
Benefiting from Speech Understanding
Integrating speech understanding into transcription workflow
offers several benefits. The foremost for MTSOs is the dramatic
increase in productivity for MTs when editing rather than
typing. Productivity also increases with AnyModal CDS because
the MT is no longer responsible for formatting reports.
Productivity continues to increase as AnyModal CDS learns from
corrections and fewer edits are required. These productivity
increases are the primary driver behind increased transcription
margins.
MT Productivity
As mentioned, AnyModal CDS produces a draft version of the
medical report for an MT to validate and correct which will
reside in InScribe. For many medical transcriptionists, the
factor that most controls their productivity is typing speed.
Physicians can speak 200 – 300 words per minute, but most
typists can only average about 65 – 90 words per minute
(includes correcting typos). This means that on average a
transcriptionist will spend about 4.5 minutes producing a
transcript for every 1 minute of dictation.
The goal of M*Modal’s AnyModal CDS is to change that ratio from
4.5:1 to an average of 2.2:1. Meaning, an MLS should spend about
2.2 minutes (on average) editing a document for every one minute
of dictation. Another way to measure productivity is using lines
per hour. On average, MTs produce about 110-130 lines of
transcription per hour. With AnyModal CDS, this rate increases
to an average of 220-250 lines per hour for a 100% increase in
productivity!
It is important to recognize that this productivity increase
takes time. The MT must learn a new skill, editing, in order to
become efficient. The chart below depicts the typical
improvement in performance over an initial 11-week period.
Another reason that using AnyModal CDS improves the
productivity of MTs is that the system separates capturing the
right meaning from getting the report into the right format. The
purpose of AnyModal Edit is to validate and correct the contents
of a medical report. Tools provided by AnyModal CDS add the
formatting from a template after the MT validates and corrects
the report. This virtually eliminates the edits made by MTs to
format the document according to account specifications.
Continuous Learning
As dictation flows from AnyModal CDS for generation, to the MT
for editing and back to AnyModal CDS for formatting, the system
learns. AnyModal CDS is trained through corrections to improve
the recognition and understanding of an individual physician’s
dictation. This continuous learning process results in constant
improvements in understanding what the speaker actually means.
In addition, AnyModal CDS learns across speakers and
specialties. The system periodically reviews large volumes of
transcription looking for edit patterns across physicians and
specialties. New medical terminology is recognized and added to
the language models. As AnyModal CDS learns, it will more
accurately recognize and understand worktypes, improving draft
quality.
Improve Transcription Margins
The future of transcription will be a combination of editing and
typing because not all typing from dictation will disappear with
the implementation of speech technologies. Some speakers are not
a good fit for speech technology because they correct themselves
and backtrack too often, or they habitually dictate with
excessive background noise prevalent (for example, speaking on a
cell phone in a convertible with the top down). Those dictators
will continue to use traditional transcription.
It is M*Modal’s goal to provide speech understanding services
that are able to work for 80% percent of an MTSOs’ transcription
volume. It is important to recognize that the overall
operational savings to your organization is dependent upon the
volume of transcription processed by AnyModal CDS and edited
rather than transcribed. The chart below shows the approximate
operational savings an MTSO will experience based on the volume
of their transcription processed by AnyModal CDS.
Increase
Document Quality and Enforce Consistency
One additional benefit of AnyModal CDS is that it helps MTSOs
with report consistency and the application of account
specifications. AnyModal CDS allows the construction of account
specifications for each MTSO customer. These account
specifications define and enforce consistency, they contain
rules for the medical reports such as allowed or required report
sections, date and time formats, numeric formats, abbreviation
rules, and spacing rules. Rules can be applied by section. These
account specifications are defined for AnyModal CDS, which then
applies the rules to all transcription for the customer.
Exceptions are permitted and can be defined by department,
physician, or worktype.
Many of our customers discover, after deploying AnyModal CDS,
that their MTs have been inconsistent in applying account
specification or that they are not fully defined for certain
customers. Using AnyModal CDS to create the draft results in
consistent documents that follow clearly defined account
specification rules.
AnyModal CDS also provides consistency with respect to the
formatting of medical reports. As mentioned above, AnyModal uses
templates to create reports. It allows multiple templates to
exist and our customers embed the tools into their workflow and
decide when to generate the appropriate document. An MTSO
customer can have templates based on specialty, worktype, or
physician. The workflow can determine the appropriate template
to apply to the approved report.
The M*Modal Difference
M*Modal is stringently focused on the healthcare industry. It is
our goal to provide the industry with the most comprehensive yet
adaptable solution for creating highly accurate, structured,
encoded, and shareable medical reports. To meet that goal,
M*Modal offers a unique combination of back-end speech
technology, client-side editing tools and integration support.
Service-based model
AnyModal CDS is an “on-demand” service. This service is provided
on a per minute basis via the Internet. This means that
customers only pay for speech understanding when they need it.
There is no fee for dictation that the workflow system decides
should be transcribed. There is also no fee for dictation that
goes through speech understanding but is not sent to an MT for
editing because it does not meet the agreed upon score for
editing effort.
Providing speech understanding as a service also means that our
customers do not need to install additional hardware (servers)
in order to take full advantage of speech understanding.
AnyModal CDS is hosted in a secure data center and M*Modal
manages the hardware, software, hardware maintenance, software
updates, back-ups, and disaster recovery. We update our software
frequently to ensure all our customers are using the best speech
understanding technology available.
Integrated editor
In order to get the full benefit of M*Modal’s speech
understanding technology, AnyModal Edit will be incorporated
into Emdat’s InScribe application.This editing component cannot
stand alone, but is designed to be embedded into InScribe to
provide seamless access to editing and typing from your Emdat
system. There are several reasons why this editing component
adds significant value to an MTSO.
First, training and continuous learning occur when the AnyModal
Edit component is used. As mentioned above, training occurs when
the document is sent to AnyModal CDS, but transcribed in the
editor. AnyModal CDS uses the transcribed report to learn how
the MT creates a report from the physician’s dictation. Both
automated training and learning function best with the use of
AnyModal Edit.
Second, AnyModal Edit captures the keystrokes used for editing a
draft document. This is very useful because editing is a learned
skill that is different from transcribing. An MT must learn how
to edit efficiently in order to realize the full advantages of
speech understanding. Capture of keystrokes by the Edit control
allows for analysis of keystrokes and additional training based
upon usage patterns to optimize MT performance.
Third, using AnyModal Edit allows M*Modal to provide audio cues
that link the recognized text to audio. This provides the
ability to link the cursor to the text while the audio is
playing – moving the cursor from word to word as it is played
back; thereby positioning the cursor for editing when the
transcriptionist sees a recognition error and needs to correct
it. Over time, MTs will learn to play the audio while editing –
significantly enhancing their productivity.
Last, AnyModal Edit provides robust typing capabilities in
addition to editing features. This allows use of AnyModal Edit
for both editing and transcribing and means that Medical
Transcriptionists can learn and utilize a single system for both
editing and transcribing. AnyModal Edit has many features
designed for transcription and our customers report that their
MTs experience an increase in transcription productivity using
AnyModal Edit as opposed to Microsoft Word.
Easy Integration
Emdat MTSO’s use our suite of applications for conducting
business. That platform consists of software and other
technology for capturing voice, assigning work to MTs,
downloading and uploading files, typing, determining which
transcripts go to QA, and report distribution, etc.
AnyModal CDS will fit seamlessly into Emdat’stranscription
workflow. Our Web services API provides access to our on-demand
services via the Internet. Use of Web services means that the
API is language and platform seamlessly integrated into Emdat’s
system.
At the beginning of integration, customers send developers to a
two to three day integration workshop. The goal of the workshop
is to leave with a prototype of their workflow that is able to
submit jobs for speech understanding and retrieve draft
documents for editing. M*Modal provides on-going developer
support throughout the workflow integration process.
In addition, if your platform consists of custom software using
a basic editing component – AnyModal Edit simply replaces the
editing component. You can keep the custom code for special
functions and update it to use AnyModal Edit. As with the Web
services, developer assistance is available from M*Modal to help
with the integration process. At the end of the integration, the
software application provided to the MTs functions very much
like the original, but with new capabilities for editing draft
documents.
Training Programs
Customers about to implement M*Modal’s speech understanding
technology should spend some time training their MTs to edit
rather than transcribe. Transcription training programs provide
the right background for editing – medical vocabulary, following
physician directions, etc., but do not yet include information
on editing vs. transcribing. The transition from the current
typing platform to AnyModal Edit provides an opportunity to
train editing skills while initially training MTs to use the new
typing/editing environment.
To assist our customers with training MTs to edit, M*Modal has
developed a “Train the Trainer” program in which our trainers
teach your MT trainers how to edit efficiently and how to train
others to do the same. We provide training materials for your
internal training program and a shortcut guide for MTs. We also
provide post-training support as your trainers gain experience
with AnyModal CDS and have questions.
As mentioned above, M*Modal can provide reports on MT editing
habits and provide input on additional training for individuals
to help optimize their performance.
Meaningful Clinical
Documents
Perhaps the most important difference regarding AnyModal CDS is
its Meaningful Clinical Documents™. Meaningful clinical
documents refer to the ability of AnyModal CDS to produce a
document that is much more than just text. Our documents are an
extension of the Health Level 7 (HL7) Clinical Document
Architecture (CDA). HL7 CDA defines a structured format for
clinical documents.
What is important to understand here is the “structured format.”
Most transcriptions are text documents with formatting, usually
stored as RTF or MS Word documents. While the text documents may
have section titles and data in them, the information is not
structured in a way that a computer can easily identify it. For
example, the text may say “blood pressure was 130 over 90” but a
computer would not be able to identify the blood pressure and
the numeric values in order to act upon them or send them to an
application that understood the data.
HL7 CDA defines how to “structure” information so that it can be
clearly identified and shared. It defines how a file should
store important measurements and medical facts so that they can
share the information and it can be used by other software
programs.
This is particularly relevant as EMR systems become more
popular. EMR systems store structured data that comes from a
patient visit. An EMR system can import a meaningful clinical
document in HL7 CDA format to create the patient visit record.
The physician could accurately record measurements such as
temperature and blood pressure through dictation.
By using HL7 CDS as the foundation for our clinical documents,
M*Modal is prepared to help MTSOs capture transcription business
that must populate EMR systems.
Conclusion
The medical transcription business is both rewarding and
challenging. Because hospitals, clinics, and private practices
see transcription as a pure expense and because there are not
enough qualified transcriptionists to complete the current
workload, MTSOs are facing smaller margins and limited profits.
What is needed is a breakthrough that will dramatically boost
the productivity of medical transcriptionists while enhancing
the value of transcription by preparing structured clinical
documents.
M*Modal’s AnyModal CDS service will allow MTSOs to improve
production time, reduce per line labor costs, and improve the
quality and consistency of medical reports. Using speech
understanding technology to capture dictation and convert it
into a structured and encoded format allows the MT to focus on
the structure and content of the draft clinical document, rather
than transcribing the entire report. Speech recognition will
never eliminate the need for qualified MTs, but will help them
evolve into “Medical Language Specialists.”
Incorporating AnyModal CDS into the existing workflow means that
there is no “training” of the physician. In addition, the
groundwork for training the MTs to edit, rather than transcribe,
is already in place. MTs have the necessary background for
editing (medical vocabulary, following physician directions,
etc). Since AnyModal Edit can be placed into any existing
platform, the transition to AnyModal CDS is seamless.
Additionally, AnyModal CDS can provide clinical documents in a
variety of formats – even for the same dictation. This allows
for the generation of patient visit reports, discharge
summaries, referring physician letters and other documentation
from a single dictation.
AnyModal CDS helps MTSOs respond to the growing adoption of EMR
systems because it can deliver the final medical report in an
electronic format that can easily be imported into an EMR
system; thus increasing the physician’s ability to care for
patients without increasing his or her workload or schedule.