Vlado.cs.dal.ca
Automatic Detection and Rating of Dementia of
Alzheimer Type through Lexical Analysis of
Spontaneous Speech
Calvin Thomas, Vlado Keˇselj, and Nick Cercone
Faculty of Computer Science
Faculty of Medicine
Saint Mary's University
Abstract— Current methods of assessing dementia of
and in normal conversation [5]. In developing new tests,
Alzheimer type (DAT) in older adults involve structured in-
researchers should look for automatic and objective methods
terviews that attempt to capture the complex nature of deficits
for use in rating dementia in patients through analysis of
suffered. One of the most significant areas affected by the disease
is the capacity for functional communication as linguistic skills
spontaneous speech that overcome the shortfalls of current
break down. These methods often do note capture the true
methods [6]. Research advances in the areas of discourse
nature of language deficits in spontaneous speech. We address
analysis, language modeling and text classification may be
this issue by exploring novel automatic and objective methods
applicable to this area and may lead to such progress.
for diagnosing patients through analysis of spontaneous speech.
In this paper, we detail several lexical approaches to the
We detail several lexical approaches to the problem of
detecting and rating DAT. The approaches explored rely on
problem of detecting and rating DAT in patients from our
character n-gram-based techniques, shown recently to perform
corpus. The large corpus used in our research consists of
successfully in a different, but related task of automatic au-
transcripts from the Atlantic Canada Alzheimer's Disease
thorship attribution. We also explore the correlation of usage
Investigation of Expectations (ACADIE) study of the drug
frequency of different parts of speech and DAT. We achieve
donepezil [7]. The goal of this research is to explore whether
a high 95% accuracy of detecting dementia when compared
with a control group, and we achieve 70% accuracy in rating
automatic techniques based on the analysis of spontaneous
dementia in two classes, and 50% accuracy in rating dementia
speech can provide objective measures of dementia levels in
into four classes.
AD patients. It is our hope that improvements in automatic
Our results show that purely computational solutions offer
techniques will extend what is understood about the effects
a viable alternative to standard approaches to diagnosing the
of dementia in Alzheimer's patients and the breakdown of
level of impairment in patients. These results are significant step
forward toward automatic and objective means to identifying
language faculties.
early symptoms of DAT in older adults.
The research discussed in this paper includes natural lan-
Index Terms— Automatic diagnostics, machine learning, nat-
guage processing and machine learning techniques that were
ural language processing
applied to the problem of rating DAT in older adults. This
interdisciplinary area brings opportunities for novel research
to be conducted with generic text classification algorithms.
Current methods of assessing dementia of Alzheimer type
Also explored are novel extensions to existing techniques that
(DAT) in older adults involve structured interviews that
were developed to address specific qualities inherent to the
attempt to capture the complex nature of deficits suffered.
corpus analyzed.
One of the most significant areas affected by the disease
In short, we found that purely computational solutions
is the capacity for functional communication as linguistic
offer a viable alternative to standard approaches to diagnosing
skills break down. With this fact in mind, interviews are
the level of impairment in patients. Although more work
designed to test linguistic abilities, including confrontation
needs to be done to improve the accuracy of these methods,
naming [1], single word production [2] or word generation
these results are significant step forward towards automatic
given context [3]. However, these methods sometimes fail
and objective means to identifying early symptoms of DAT
to identify early symptoms observed by family members
in older adults.
during normal conversation [4], and often fail to describe
adequately the level of impairment in low scoring patients,
II. BACKGROUND AND RELATED WORK
unless similarities exist between performance during exams
Dementia of Alzheimer type. A significant component of
the dementia of Alzheimer type (DAT) that accompanies
Since the introduction of the MMSE, this test has been
Alzheimer's disease (AD) is aphasia, a loss of written and
widely used in clinical applications as an aid to diagnosis
oral communicative ability [8], [9]. Symptoms of aphasia
and in monitoring the progression of the dementia in indi-
include breakdowns in semantic processing, shallow vocab-
viduals. The exam is also standardly used in the clinical and
ularies and word-finding difficulties leading to the deterio-
therapeutic research community as a basis for discretizing
ration of spontaneous speech [10]. This deterioration begins
populations into normal, mild, moderate and severe dementia
early in the onset of the disease and is often observed by
levels according to the DSM-IV [8]. Less standard, however,
family members during conversational situations [4]. Further,
is the selection of boundary points in a community setting,
recent studies of oral and written spelling have shown marked
since performance has been linked to level of education and
differences in language ability between AD patients and
other issues that may be characteristic of the population. With
healthy older adults [11], [12].
that said, "a variety of cutpoints have been suggested over
For example, Ronald Reagan, former president of the
the years, with 17/18 for clear-cut cases, 21/22, 23/24 and
United States, exhibited signs of AD from the outset of his
even 25/26" [15].
presidency. Reagan's speeches suffered from word-finding
Verbal Picture Descriptions. Verbal picture descriptions
difficulties, inappropriate phrases and uncorrected sentences
can be used to assess the level of cognitive impairment and
that were obvious signs of his deterioration, but the fact that
"are among the most sensitive measures for assessing spon-
he had AD was not released until 1994 [13].
taneous speech in AD" [10]. In these exams, the patient is
Current methods of assessing DAT levels in patients
supplied with a simple or complex line drawing that he or she
involve structured interviews that attempt to capture the
must verbally describe. These narratives are recorded on tape
breakdown of communicative capacity by testing specific
and later analyzed according to a variety of speech attributes
linguistic abilities, including confrontation naming [1], single
including articulation, grammar, phrase length, paraphasias,
word production [2] or word generation given context [3].
word-finding difficulties, themes and information content.
However, these methods sometimes fail to identify early
While simple pictures may be useful in identifying patients
symptoms observed by family members during normal con-
with moderate deficiencies, more complex drawings may be
versation [4], and often fail to describe adequately the level
helpful for screening patients with mild dementia [10], [13].
of impairment in low scoring patients, unless similarities
exist between performance during exams and in normal
III. A LEXICAL APPROACH
conversation [5].
Mini-Mental State Exam. The Mini-Mental State Exam
Research in the area of automatic dementia detection in
(MMSE) is a cognitive grading scale used in the assessment
Alzheimer's patients has been quite limited, with few results
of patients first described by Folstein et al. [14] in 1975. This
found in a search of the literature [6], [16]. Bucks et al. [6]
test addressed a need for a relatively short screening exam
conducted a small study with 24 individuals: 8 patients and
that could be used to reliably identify cognitive impairment
16 healthy controls. The authors collected 8 lexical statistics
in a clinical setting. Here, "mini" refers to the fact that
over the first 1000 words of spontaneous speech during
this exam concentrates only on the cognitive impairment
interviews, namely noun (N), pronoun (P), adjective (A) and
of mental function and excludes mental deficits covered by
verb (V) rates, type-token ratio (TTR), Brun´et's Index (W),
comprehensive exams, including mood and abnormal mental
Honor´e's Statistic (R) and the Clause-like Semantic Unit
functions [14].
(CSU) rate. The results showed that the stylometric attributes
The MMSE involves a patient responding to 17 questions
had sufficient discriminating power in distinguishing between
that cover a wide range of cognitive domains: orientation,
the language models of AD sufferers and control subjects.
registration, short-term memory, attention, calculation, visuo-
N-rate, P-rate, A-rate and V-rate are the average rate of
spatial skills and praxis. Testing of the areas described
occurrence for each respective part-of-speech (POS) category.
above is divided into two sections; the first requires verbal
These measures capture the lexical distribution of the spoken
responses to orientation, memory, and attention questions.
words and were selected heuristically. Bucks et al. found that
The second section requires reading and writing and covers
AD patients had "higher mean P-rate, A-rate, V-rate scores,
ability to name, follow verbal and written commands, write
but lower N-rate scores compared with normal older controls"
a sentence, and drawing intersecting pentagons. Testing time
varies according to impairment level ranging between 5 and
The next three statistical attributes were selected to capture
10 minutes and can be administered by clinicians, nurses,
the lexical richness of the participant's speech. is the
psychologists, paramedical staff and lay interviewers, with
ratio of the total vocabulary
to the overall text length
limited training.
and is sensitive to the length of text collected. This measure
Algorithm 1 Profile dissimilarity(
mountain . .
2:
for all n-grams
Honor´e's Statistic
Clause-like semantic unit
6:
Return FG@H
ATTRIBUTE SET DESCRIBED IN BUCKS ET AL. [6]
methods, including language dependencies explicitly built
into the model, word segmentation concerns and sparsity
of data due to the large vocabulary. Overcoming these
obstacles are particularly difficult when dealing with Asian
where higher values are associated with a broader vocabulary.
languages such as Chinese or Japanese that do not have
is a length insensitive version of
explicit word boundaries. By using byte-level n-grams the
calculated using the following equation:
authors dramatically reduce the vocabulary, clearly define
boundaries between units and do not make use of any
language dependent information, including word boundaries,
The resulting value
typically ranges between 10 and 20,
character case, white-space characters or punctuation [21].
with richer speech producing lower values [17]. Honor´e's
However, due to their frequency and consistency of use by
authors, white-space and punctuation characters implicitly
is also insensitive to length and is calculated as
play a significant role in classifier performance.
Author models are modeled by CNG profiles that are
defined as "a set of the
most frequent n-grams with their
is the number of words in the vocabulary only
normalized frequencies generated from training data" [21]
spoken once. Higher values of indicate a richer vocabulary
and, hence, the two parameters of importance to the CNG
[18]. The CSU rate is a "measure of semantic cohesion in
method are n-gram size
and the profile length . Due to
phrases . . and characterizes the participant's ability to form
the fixed and small vocabulary of ASCII characters used, the
noun and verb phrases and gives an indication of the flow of
CNG method does not suffer from the sparse data problems
speech" [19]. To calculate this value, the corpus must first
of word n-gram approaches at low values of . To be sure,
be hand-tagged according to a set of 13 rules that identify
the work in [21] indicates that values for
cohesion boundaries in phrases. The CSU rate is the average
employed before computational limitations and performance
number of units found per 100 words. Patients suffering from
decreases are encountered. This point contrasts with word-
dysphasia find it difficult to formulate long phrases leading
based approaches which are computationally feasible with
to higher CSU rates than in normal speakers, making this
up to 3 or 4 [20]. The profile length
variable "the most important discriminator between normal
the number of n-grams considered during the similarity
and dysphasic speech" [16]. Bucks et al. [6] confirmed that
calculation and serves to keep profiles small when large
AD patients use less rich speech vocabulary according to the
are used. Small profile lengths not only improve
three lexical richness measures ,
computational performance but also reduce model overfitting.
significant differences in CSU rates between AD patients and
This was supported by the fact that pruning threshold e
controls were not found in the data. Table I gives a summary
was shown to improve accuracy with optimal values lying
of the attributes detailed above.
n-grams [21].
Common N-Grams (CNG) approach. The Common N-
Classification via Common Word Frequencies. Using com-
Grams (CNG) approach to authorship attribution uses charac-
mon word frequencies as style markers has be studied exten-
ter n-grams to model consistencies in author style. Traditional
sively by Burrows [22], [23], [24] and further investigated
n-gram language models intuitively treat documents as a
by Stamatatos et al. [25]. Both of these approaches focused
sequence of words and rely on word n-grams to capture
on using the most frequent words in a text corpus as
consistencies with state-of-the-art performance [20]. How-
style markers. The primary difference between these two
ever, several difficulties arise when working with word based
approaches is the training corpus from which these style
Fig. 1. Histogram of MMSE score
Fig. 2. Histogram of MMSE-based classes
markers were selected. Burrows argues for frequent terms
that are selected from the target corpus itself and has shown
effective classification results over a wide variety of literature
domains [23], [24]. Stamatatos et al. [25] improved on
previous results by extracting these style markers from the
British National Corpus rather than the target corpus itself.
IV. PROBLEM, DATA AND SOLUTION
The research in this paper explores several approaches to
the problem of automatically diagnosing the dementia level of
Alzheimer's patients through analysis of spontaneous speech
Fig. 3. Summary of accuracy on two-class task
captured in a transcript. Each of these approaches assume that
recognizable language artifacts, which are a function of the
dementia level in patients, exist. Further, we are interested
a na¨ıve ZeroR rule-based classifier, which predicts the modal
in attributes that can be extracted automatically from patient
class during training for test instances. Overall, these results
transcripts and can be used to reliably and consistently model
show that intelligent machine learning approaches performed
the dementia level of AD patients.
better on the corpus than the na¨ıve baseline of weighted
ACADIE Dataset. The dataset used during analysis and
random guessing. This indicates that pairing spontaneous
experimentation contains the language spoken by
speech data with machine learning techniques is a viable
Goal Attainment Scaling interviews between field
approach to the task of predicting dementia levels. Further,
researchers, Alzheimer patients, and care-givers, compiled
the results suggest improvements in classification accuracy
within the Atlantic Canada Alzheimer's Disease Investigation
are obtained by breaking large lexical categories into its
of Expectations (ACADIE) study of donepezil [7]. The
smaller constituents by including modifier relationships.
dataset includes two interviews per patient with interviews
Figure 3 illustrates the classification accuracies of the
conducted at assessment visits 12 weeks apart to examine
explored methods on the two class prediction task. In this task
the effects of the drugs administered during the interim.
the classification algorithm must label test instances as A)=m
Interviews were conducted at six sites across Atlantic Canada
scoring on the MMSE scale.
a severe or moderate level of DAT impairment, while nM?+opn
MMSE scores are provided with the interview transcripts,
indicates that the patient should be placed in the
with discretized scores in the ranges 0–15, 16–20, 21–24, and
dementia classes. The ZeroR rule-based classifier
25–30, according to [14].
produced a baseline accuracy of
for this task. From
the other classifiers explored, an accuracy range of
UMMARY OF THE RESULTS
was observed. On is task, the best accuracy was shared
Each of the figures in this section gives the classification
, while trailing close behind was
performance in terms of maximum accuracy obtained for
the ordinal CNG method with an accuracy of
each explored approach on a specific classification task.
The second classification task required the algorithm to
Importantly, also included in each chart are the results from
predict one of four class labels for a test instance:
Fig. 4. Summary of accuracy on four-class task
Fig. 6. Summary of accuracy on mild/normal task
was posted for this task by the ZeroR
classifier. The observed accuracy range for the other methods
performed the worst here and was only narrowly more
accurate than the baseline. The best classification accuracy
was achieved by the
attribute selection method at
. The next closest method in terms of classification
accuracy was the other frequent words based method at
Fig. 5. Summary of accuracy on severe/normal task
The thrust of this work was to examine the potential use of
natural language processing and machine learning techniques
. The results from this task are
in the diagnosis of dementia of Alzheimer type (DAT) in
shown in Figure 4. On this task, a baseline accuracy of
older adults. Framing this problem as a text classification
was set forth by the ZeroR classifier, and a range of
task, we present several viable approaches based on mature
was observed. The highest accuracy was achieved
algorithms and implementations. The main contributions are:
by classifiers using the
attribute selection method
a detailed statistical analysis of the lexical features
. The next best classifier was standard CNG with
exhibited in the spontaneous speech of older adults with
, closely followed by
Alzheimer's disease,
Figure 5 compares the prediction accuracy for algorithms
novel application of several machine learning and natu-
on a third task. This task involved predicting class labels for
ral language processing techniques in rating DAT,
instances from the severe and normal groups only. The na¨ıve
a novel classification algorithm in Ordinal CNG, and
baseline method produced an accuracy of
on this task.
positive results in detecting DAT through an extensive
All of the intelligent methods examined in these experiments
exploration of classification methods.
produced significantly higher classification accuracies with
1) Lexical analysis: A detailed statistical analysis was
. Again, on this task the most
conducted on transcripts of spontaneous conversational
accurate classifier was built over an attribute set consisting
speech collected from Alzheimer's patients. Analysis of
of frequent word ratios. Interestingly, both the
spontaneous speech has the potential of offering many clues
produced the same classification accuracy at
to the ties between linguistic ability and the extent of DAT.
. One other approach produced an accuracy above
We chose to approach attribute selection from a statistical
. A particularly noteworthy
standpoint rather than rely on heuristics as in Bucks et al.
observation is that the
attribute set beat out
[6]. We also believed that the detail of the Connexor part-
on this task.
of-speech tagger (POS) should be exploited to narrow the
Figure 6 contains results from the mild/normal classifi-
lexical categories analyzed. Our experiments confirmed the
cation task. This task requires the algorithm to label test
validity of our assumptions leading to higher accuracies and
groups only. A baseline
a better understanding of the data. During our lexical analysis
of the data we found that closed class words were particularly
[6] R. Bucks, S. Singh, J.M., Cuerden, and G. Wilcock, "Analysis of
helpful in predicting the level of language deficit in patients.
spontaneous, conversational speech in dementia of Alzheimer type:
Additionally, we found that lexical richness measures were
Evaluation of an objective technique for analyzing lexical perfor-
mance,"
Aphasiology, vol. 14, no. 1, pp. 71–91, 2000.
not powerful discriminators for our purposes.
[7] K. Rockwood, J. Graham, and S. Fay, "Goal setting and attainment
2) Novel application: Applying the CNG algorithm,
in Alzheimer's disease patients treated with donepezil,"
Journal of
which was originally developed for authorship attribution,
Neurology, Neurosurgery and Psychiatry, vol. 73, pp. 500–507, 2002.
[8] A. P. Association,
Diagnostic and Statistical Manual of Mental Disor-
to our DAT classification problem showed that the algorithm
ders, 4th ed., Washington, DC, 1994.
is robust with respect to application. The standard algorithm
[9] J. Cummings, F. Benson, M. Hill, and S. Read, "Aphasia in dementia
was applied without modification and achieved some of the
of the Alzheimer type,"
Neurology, vol. 35, pp. 394–397, 1985.
[10] K. Forbes, A. Venneri, and M. Shanks, "Distinct patterns of sponta-
most accurate results observed. This robustness is due to the
neous speech deterioration: an early predictor of Alzheimer's disease,"
byte-level n-grams used to construct the class profiles.
Brain and Cognition, vol. 48(2-3), pp. 356–61, 2002.
During our lexical analysis of the data we found that
[11] S. Pestell, M. Shanks, J. Warrington, and A. Venneri, "Quality of
spelling breakdown in Alzheimer's disease is independent of disease
closed class words were helpful in predicting the level of
progression,"
Journal of Clinical and Experimental Neuropsychology,
language deficit in patients. Naturally, this lead us to examine
vol. 22, pp. 599–612, 2000.
in more detail these classes of words to determine if deeper
[12] H. Platel, J. Lambert, F. Eustache, B. Cadet, M. Dary, F. Viader, and
B. Lechevalier, "Characterstics and evolution of writing impairment in
relationships exist between the statistics and the observed
Alzheimer's disease,"
Journal of Clinical and Experimental Neuropsy-
effect in patients. Previous research had been done in the
chology, vol. 22, pp. 599–612, 1993.
field of text classification where commonly used words were
[13] A. Venneri, O. Turnbull, and S. Della Salla, "The taxonomic perspec-
tive: the neuropsychological diagnosis of dementia,"
Revue Europeenne
used as style markers. Our experiments showed that the novel
de Psychologie Apllique, vol. 46, pp. 81–86, 1996.
approach to detecting deficit and novel application for these
[14] M. Folstein, S. Folstein, and P. McHugh, "Mini-mental state. a practical
generic text classification algorithms were well suited for
method for grading the cognitive state of patients for the clinician,"
Journal of Psychiatric Research, vol. 12, pp. 189–198, 1975.
each other producing some of the most accurate models.
[15] C. Brayne, "The mini-mental state examination, will we be using it
3) Algorithm extension: In addition to the standard CNG
in 2001?"
International Journal of Geriatric Psychiatry, vol. 13, pp.
algorithm, an ordinal CNG extension was developed and
285–294, 1998.
[16] D. Holmes and S. Singh, "A stylometric analysis of conversational
tested. This algorithm was designed to take advantage of a
speech of aphasic patients,"
Literary and Linguistic Computing, vol. 11,
natural ordering of classes, leveraging the training instances
pp. 45–60, 1996.
within the extreme groups. Our results showed that classifica-
[17] E. Brun´et, "Le vocabulaire de jean giraudoux,"
Structure et Evolution,
tion accuracy was not affected by the exclusion of
[18] A. Honor´e, "Some simple measures of richness of vocabulary,"
As-
training instances. This observation leads us to
sociation of Literary and Linguistic Computing Bulletin, vol. 7, pp.
believe that our method effectively generates models using
172–177, 1979.
[19] S. Singh, "Computational analysis of conversational speech in dyspha-
fewer training instances, but with better discriminating char-
sic patients," Ph.D. dissertation, University of the West of England,
4) Positive results: The positive results reported in this
[20] F. Peng, D. Schuurmans, V. Keselj, and S. Wang, "Automated author-
ship attribution with character level language models," in
Proceedings
work were arrived at after an extensive exploration of
10th Conference of the European Chapter of the Association for
classification methods. This research showed that several
Computational Linguistics (EACL 2003), 2003.
standard classification algorithms could be used to produce
[21] V. Keselj, F. Peng, N. Cercone, and C. Thomas, "N-gram-based
author profiles for authorship attribution," in
Proceedings of Pacific
classification accuracies significantly higher than our na¨ıve
Association for Computational Linguistics (PACLING'03), 2003.
rule-based classifier that always selects the modal class.
[22] J. Burrows, "Word-patterns and story-shapes: The statistical analysis
of narrative style,"
Literary and Linguistic Computing, vol. 2, no. 2,
pp. 61–70, 1987.
[23] ——, "Not unless you ask nicely: The interpretative nexus between
[1] J. Hodges, D. Salmon, and N. Butters, "The nature of the naming
analysis and information,"
Literary and Linguistic Computing, vol. 7,
deficit in Alzheimer's and Huntington's disease,"
Brain, vol. 114, pp.
no. 2, pp. 91–109, 1992.
1547–1558, 1991.
[24] ——, "‘Delta': a measure of stylistic difference and a guid to likely
[2] A. Martin and P. Fedio, "Word production and comprehension in
authorship,"
Literary and Linguistic Computing, vol. 17, no. 3, pp.
Alzheimer's disease: the breakdown of semantic knowledge,"
Brain
267–287, 2002.
and Language, vol. 35, pp. 394–397, 1983.
[25] E. Stamatatos, N. Fakotakis, and G. Kokkinakis, "Text genre detection
[3] L. Phillips, S. D. Sala, and C. Trivelli, "Fluency deficits in patients
using common word frequencies," in
Proceedings of 18th International
with Alzheimer's disease and frontal lobe lesions,"
European Journal
Conference on Computational Linguistics (COLING2000), vol. 2,
of Neurology, vol. 3, pp. 102–108, 1996.
2000, pp. 808–814.
[4] C. Crockford and R. Lesser, "Assessing functional communication in
aphasia: Clinical utility and time demands of three mehods,"
European
Journal of Disorders of Communication, vol. 29, pp. 165–182, 1994.
[5] S. Sabat, "Language function in Alzheimer's disease: a critical review
of selected literature,"
Language and Communication, vol. 14, pp. 331–
Source: http://vlado.cs.dal.ca/papers/icma05.pdf
Leitlinien der DGN 2008 Diagnostik und Therapie komplexer regionaler Schmerzsyndrome (CRPS) Was gibt es Neues? • Während in der Akutphase eines CRPS peripher-entzündliche Vorgänge vorherrschen, entwickeln sich mit der Dauer der Erkrankung zunehmend neuroplastische Veränderungen im ZNS. Diese Änderung der Pathophysiologie muss in der Therapieplanung berücksichtigt werden.
Virus Adaptation and Treatment open access to scientific and medical research Open Access Full Text Article A paradigm linking herpesvirus immediate-early gene expression apoptosis and myalgic encephalomyelitis chronic fatigue syndrome This article was published in the following Dove Press journal: Virus Adaptation and Treatment21 February 2011Number of times this article has been viewed