A pilot study to determine whether machine learning methodologies using pre-treatment electroencephalography can predict the symptomatic response to clozapine therapy
Contents lists available at
Clinical Neurophysiology
A pilot study to determine whether machine learning methodologies usingpre-treatment electroencephalography can predict the symptomatic responseto clozapine therapy
Ahmad Khodayari-Rostamabad , Gary M. Hasey ,Duncan J. MacCrimmon ,, James P. Reilly
Hubert de Bruin a Electrical and Computer Eng. Dept., McMaster University, Hamilton, ON, Canada L8S 4K1b Dept. of Psychiatry and Behavioral Neurosciences, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada L8S 4L8c Mood Disorders Program, Centre for Mountain Health Services, St. Joseph Hospital, Hamilton, ON, Canada L8N 3K7d School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada L8S 4K1
Objective: To investigate whether applying advanced machine learning (ML) methodologies to pre-treat-
Accepted 11 May 2010
ment electroencephalography (EEG) data can predict the response to clozapine therapy in adult subjects
Available online 17 June 2010
suffering from chronic schizophrenia.
Methods: Pre-treatment EEG data are collected in 23 + 14 schizophrenic adults. Treatment outcome, after
at least one year follow-up, is determined using clinical ratings by a trained clinician blind to EEG results.
First, a feature selection scheme is employed to select a reduced subset of features extracted from the
subjects' EEG that is most statistically relevant to our treatment-response prediction. These features
are then entered into a classifier, which is realized in the form of a kernel partial least squares regression
method that performs response prediction. Various scales, including the positive and negative syndrome
scale (PANSS) are used as treatment-response indicators.
Results: We determined that a set of discriminating EEG features do exist. A low-dimensional represen-tation of the feature space showed significant clustering into clozapine responder and non-respondergroups. The minimum level of performance of the proposed prediction methodology, tested over a rangeof conditions using the leave-one-out cross-validation method using the original 23 subjects, with furthertesting in an independent sample of 14 subjects, was 85%.
Conclusions: These findings indicate that analysis of pre-treatment EEG data can predict the clinicalresponse to clozapine in treatment resistant schizophrenia.
Significance: If replicated in a larger population, this novel approach to EEG analysis may assist the clini-cian in determining treatment-efficacy.
! 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights
logical side effects can be life threatening, blood samples to mon-itor the white blood cell count must be collected as long as the
Compared with other antipsychotic medications the atypical
drug is used, at weekly to monthly intervals. The logistic difficul-
antipsychotic medication clozapine is recognized to have superior
ties for the patient and the treatment team are substantial. A meth-
therapeutic effectiveness in the treatment of chronic medication-
od that could reliably determine, before the onset of therapy,
resistant schizophrenia (e.g., ). However, cloza-
whether a given patient will or will not respond to clozapine would
pine may produce serious side effects such as seizures, cardiac
greatly assist the clinician in determining whether the risks and lo-
arrhythmias or bone marrow suppression with neutropenia
gistic complexity of clozapine are outweighed by the potential
(According to a recent Cochrane review, about
34% of treatment-resistant patients respond to clozapine while
Quantitative electroencephalography (QEEG or EEG) may offer
3.2% develop blood problems (). As the hemato-
some promise in this regard. EEG abnormalities in schizophrenicsubjects and EEG changes due to clozapine therapy have beenthe focus of a number of clinical studies (see e.g.,
* Corresponding author. Tel.: +1 905 525 9140x22895; fax: +1 905 521 2922.
E-mail address: (J.P. Reilly).
1388-2457/$36.00 ! 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
doi:
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
laboratories. Therefore, predictive algorithms dependent on EEG
measurements are more practical. Furthermore, since the required
EEG data is acquired during the resting state, only minimal cooper-
Based on findings in 17 schizophrenic subjects,
ation is required from the patient. Thus, an EEG based method of
found that the clozapine-induced improvement of psycho-
predicting treatment response would have many advantages over
pathology symptom ratings using the Positive and Negative Syn-
imaging methods such as MRI, PET or MEG.
drome Scale (PANSS) was correlated with pre-treatment QEEG
The goal of the present pilot study is to examine the utility of
inter and intra-hemispheric spectral power asymmetry. Greater
machine learning (ML) methods for processing EEG signals to pre-
pre-treatment anterior to posterior asymmetry in the delta fre-
dict the response of schizophrenic subjects to clozapine.
quency range was associated with greater improvement in nega-tive symptoms while greater pre-treatment anterior to posteriortheta asymmetry predicted improvement of positive symptoms
and global improvement. Larger inter-hemispheric asymmetry inthe theta and beta frequencies in the central and anterior temporal
2.1. Quantitative EEG recordings
regions were, respectively, predictive of greater improvement inpositive and negative symptoms. also found that
We collected pre-clozapine resting EEG data from chronically
changes in the theta frequency in QEEG with clozapine treatment,
ill, treatment-resistant schizophrenic subjects prior to beginning
particularly in the midline electrodes over the fronto-central scalp
clozapine therapy. The data were collected without change to the
area, were a more sensitive indicator for the evaluation of cloza-
patient's current medication regimen. EEG was recorded with the
pine treatment efficacy than the serum clozapine level. Though
patient in a semi-recumbent position in a sound attenuated, elec-
these methods reveal important relationships between QEEG vari-
trically shielded room by an experienced technician who prompted
ables and clinical outcome, a series of simple correlational analyses
patients on signs of drowsiness. Sessions were arranged in the
do not readily yield a ‘‘responder" or ‘‘non-responder" dichoto-
mornings and patients were requested to avoid coffee, drugs, alco-
mous categorization for an individual patient.
hol and smoking immediately prior to the recording. A maximum
The above analyses employed standard statistical methods. On
of ten and a half minutes of eyes-closed (EC) and of eyes-open
the other hand, a more mathematically sophisticated analysis
(EO) data respectively were collected in up to three separate
including pattern recognition and dimensionality reduction meth-
3.5 min runs using a QSI-9500 system, giving a total of 3 EO and
ods (which together may be categorized as machine learning tech-
3 EC files. Electrodes were placed in the 10/20 configuration refer-
niques) can perform a more comprehensive data analysis. Machine
enced to linked ears with impedances below 5 kX. The signals
learning techniques are finding increasing application in psychia-
were band pass filtered between [0.5 and 80 Hz] and notch filtered
try, particularly when multi-dimensional, noisy, highly complex
at 60 Hz by the QSI system during the recording. Data were digi-
data or multi-modal data sets are analyzed together, (see e.g.,
tized at a rate of 204.8 Hz. Since our selected features were either
). For example, support vector machine (SVM)
intra- or inter-hemispherical, we discarded the data from the mid-
techniques that select spectro-temporal patterns from multichan-
line electrodes (FZ, CZ, PZ, and OZ) in the interests of saving com-
nel magnetoencephalogram (MEG) data collected during a verbal
putational resources. The 16 remaining EEG electrodes used in our
working memory task have been used to distinguish schizophrenic
study were Fp1, Fp2, F3, F4, F7, F8, T3, T4, C3, C4, T5, T6, P3, P4, O1
from control subjects ). Machine learning algo-
rithms using structural brain magnetic resonance (MRI) images
For de-artifacting, the data were partitioned into segments of 1
), functional MRI (fMRI) data
s duration. If the input signal on any electrode saturated the acqui-
) and combined genomic and clinical data
sition hardware at approximately plus or minus 160 lv, the entire
have been employed to separate schizophrenic, bipolar
segment was rejected. The signals were then digitally bandpass fil-
and healthy control subjects.
tered after recording between 4 and 42 Hz to partially mitigate the
Machine learning approaches have also been applied to predic-
effects of eye movement and muscle artifacts. For each EEG file, the
tion of clozapine treatment-efficacy. describes a
first 60 segments of the de-artifacted part of the 3.5 min of data
study in which a feed-forward multilayer perceptron network
were used, since several segments were heavily artifacted, leaving
(with a back-propagation error training technique) is employed
only this number of segments that were uncorrupted on all elec-
using clinical and pharmacogenetic data to predict clozapine re-
trodes. The selected data in each of the three files for both the
sponse in schizophrenic subjects. Five pharmacogenetic variables
EO and EC cases were divided into 2 epochs of 40 s duration with
and five clinical variables (including gender, age, height, baseline
50% overlap, to give a nominal 12 epochs per subject. These epochs
body weight, and baseline body mass index) were collated from
were used to extract statistical quantities (such as absolute pow-
93 schizophrenic subjects taking clozapine, including 26 respond-
ers, power spectral densities, coherences, etc.) that became the
ers. Using this method, they obtained an overall prediction accu-
candidate features as described below. When estimating these sta-
racy rate of 83.3%.
tistical quantities, each epoch was divided into overlapping 1 s
describes a Bayesian hierarchical model using
windows with 60% overlap between adjacent windows. The
pre-treatment fMRI and positron emission tomography (PET) infor-
respective statistical quantity was then calculated over each win-
mation coupled with patient characteristics (e.g. medical or family
dow and the desired result obtained by averaging over all win-
history and genotype) as training data to predict changes in brain
dows. In the experimental results which follow, all EO and EC
activity in 16 schizophrenic subjects following treatment with two
epochs were combined, to make maximum use of the available
atypical antipsychotics (risperidone or olanzapine). The authors
postulated that predicting drug-induced changes in brain activitywould assist the clinician in determining optimal drug choice.
2.2. Description of subjects and the clinical assessment procedures
However, the clinical utility of these previous approaches is
negatively impacted by the expense and unavailability of complex
Subjects, comprising both in-patients and out-patients, were re-
methods such as fMRI, PET, genetic screening and MEG. In contrast,
cruited from the schizophrenia program at St. Joseph's Hospital,
electroencephalography (EEG) is an inexpensive, non-invasive
Centre for Mountain Health Services, Hamilton, Ontario. All sub-
technique widely available in smaller hospitals and in community
jects met both DSM-IV criteria for schizophrenia and the
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Table 1Demographic information of the 23 subjects (denoted Group A) who participated in the study. The lower 4 items in the table are scales related to the PANSS clinical rating score.
Age at start of treatment [years]
Average = 41.2, std = 8.4, min = 28.8, max = 57
Educational Level
Average = 3.1, std = 1.4, min = 2, max = 7
Age at symptom onset [years]
Average = 21.2, std = 5, min = 14, max = 32
Total # of hospitalizations (Pre-clozapine)
Average = 9.7, std = 13, min = 0, max = 63
Duration total of hospitalization (Pre-clozapine) [days]
Average = 615.7, std = 928, min = 0, max = 3789
Chlorpromazine equivalents (Pre-clozapine) [mg/day]
Average = 726.6, std = 636, min = 40, max = 2485
Clozapine dose [mg/day]
Average = 344.6, std = 157, min = 50, max = 600
Post-treatment Positive Symptoms Scale
Average = 17.8, std = 3.4, min = 11, max = 24
Post-treatment Negative Symptoms Scale
Average = 23, std = 3.9, min = 12, max = 32
Post-treatment General Symptoms Rank (GR)
Average = 46.3, std = 5.7, min = 32, max = 56
Post-treatment Total Rank (PSS + NSS + GR)
Average = 87.2, std = 10.9, min = 58, max = 101
1 Education level rating: 1: grade 6 or less, 2: grade 7 to 12 without graduating, 3: graduated high school, 4: part college, 5: graduate 2 years college, 6: graduate 4 years
college, 7: part graduated/professional school, 8: completed graduated professional school.
criteria for treatment resistance. Patients meeting
conducted a hypothesis test on the means, assuming the QCA data
these criteria may be considered to be ‘‘severely symptomatic",
points are independent and normally distributed, and that the vari-
i.e., as suffering acutely from schizophrenia. All subjects gave in-
ances of the R and NR groups are identical. It is straightforward to
formed consent.
show that the respective likelihood ratio is F-distributed. In this
Data from two groups of schizophrenic subjects were used in
case, df = 10, 11 for the numerator and denominator, respectively,
this retrospective study. The first group (Group A) consists of 23
with F = 1.1056 and p = 0.43. Thus, there is no evidence to suggest
subjects. Group B is an independent sample of 14 subjects. Available
the pre-treatment QCA means of the two groups are significantly
socio-demographic and clinical information for Groups A and B are
shown in . Symptom severity after clozapine treat-
Group B subjects are defined as responders to clozapine therapy
ment is measured in Group A using the positive and negative syn-
if there is an improvement of at least 25% between the pre- and
drome scale (PANSS) score ). PANSS evaluations
post-QCA scores. This level of relative change represents a clini-
are not available for Group B subjects. As PANSS scores were not
cally significant improvement in symptom severity considering
available for Group A subjects prior to clozapine treatment, pre-
the fact that all the subjects in our study were in the treatment-
treatment symptom severity was assessed through a quantitative
resistant population See e.g.,
clinical assessment (QCA) conducted by review of the clinical record
who used a 20% relative change as response indicator.
guided by the structure of the PANSS. The QCA procedure is outlinedin Appendix A. As all QCA ratings were completed before initiationof this study raters were blind to the machine learning outcome
2.3. Overview of the machine learning process
predictions. QCA was used to assess psychopathology both preand post clozapine treatment in Group B.
We now present a brief overview of the machine learning pro-
We now discuss how we determine whether a patient is a re-
cess used for prediction of clozapine response. A necessary compo-
sponder (R) or non-responder (NR). In this retrospective pilot study
nent of this process is the collection of a training set. In our case,
quantifying clinical response is complicated by the absence of pre-
the training set consists of Mp EEG epochs from each of M subjects,
treatment PANSS scores. We were therefore obliged to define re-
for a total of Mt epochs altogether. In our , M = 23,
sponse on the basis of a single post-treatment PANSS score. To
Mp = 12 and Mt = 270. The training set also includes the set of re-
do this we created post-treatment PANSS thresholds d1 to as-
sponse outcomes yi ,i = 1,. .,Mt corresponding to each epoch; i.e., if
sess response: first we rank-ordered all subjects by post-treatment
the subject corresponding to the ith EEG epoch is a responder
PANSS score then chose a value of d1 (88.5) such that our 23 subjects
(non-responder), then the value of yi is R (NR), determined by the re-
were divided into responder (R) and non-responder (NR) classes
sponse criterion discussed previously.
with roughly equal number of subjects (R = 12, NR = 11).
There are three phases in a machine learning procedure. These
Having R and NR groups of similar size has advantages with re-
are the design, operational and evaluation phases, as outlined in
spect to the machine learning process; however, this assumes that
The design phase, which consists of the feature extraction,
clinically significant improvement is seen in about 50% of those
feature selection and classification components, is now described.
treated with clozapine. Others have reported that, on average, only
The Design Process is depicted in (a). The first step is to ex-
34% of treatment-resistant schizophrenic patients will respond to
tract candidate features from each epoch of pre-treatment EEG
clozapine. For this reason we also reanalyzed our data using a va-
data. In our study, these features are statistical quantities including
lue d1 ¼ 83:5 which yields a 30% response rate (i.e. with 7 R and 16
coherencbetween all electrode pairs at various frequencies, corre-
NR subjects in group A).
lation and cross-correlation coefficients, mutual information be-
We must confirm that the pre-treatment QCA means of the R
tween all sensor pairs ), absolute and
and NR subgroups of group A subjects are not significantly differ-
relative power levels at various frequenciesthe left-to-right hemi-
ent, so that the post-treatment PANSS rating alone accurately indi-cates the effect of the treatment on the subject. To this end, we
2 The total number of epochs is nominally 12 x 23 = 276. However, there are only 8
and 10 available epochs for 2 of the subjects, leaving only 270 net epochs.
3 We calculated the magnitude squared coherence estimate using the averaged
1 Using the PANSS data, the ‘total rank' (TR) score is used as the clinical assessment
periodogram method of Welch by the MathWorks MATLAB software, ver. 7.1. See
in our experiments. TR is the sum of three scales in PANSS: 1. general rank, (GR), 2.
positive (or productive) symptoms scale, (PSS), 3. negative (or deficit) symptoms
4 Using power spectral density (PSD) estimate via Welchs' averaged modified
scale, (NSS). This means that TR = GR + PSS + NSS.
periodogram method in MATLAB.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Table 2Available demographic information of the 14 subjects denoted by Group B.
Age at start of treatment [years]
Average = 35.7, std = 10, min = 22, max = 55.5
Educational level
Average = 3.3, std = 1.64, min = 2, max = 7
Age at symptom onset [years]
Average = 21.3, std = 5.28, min = 15, max = 31
Total # of hospitalizations (Pre-clozapine)
Average = 6.43, std = 6.9, min = 0, max = 18
Duration total of hospitalization (Pre-clozapine) [days]
Average = 470.8, std = 627, min = 0, max = 1879
Chlorpromazine equivalents (Pre-clozapine) [mg/day]
Average = 628, std = 404, min = 40, max = 1169
Clozapine dose [mg/day]
Average = 396.4, std = 101, min = 200, max = 500
1 See Table 1 for definition.
sphere power ratio, the anterior/posterior power gradient across
feature vector xi maps into the R region if the subject correspond-
many frequencies and between electrodes (calculated using loga-
ing to the i-th epoch is a responder, and into the NR region other-
rithm difference of power spectral density values). These quantities
wise. In practice however, the clusters overlap somewhat, so that
can all be readily calculated from the measured EEG signal. The
feature vectors from a few epochs of the R subjects map into the
number Nc of such candidate features can be quite large. In our
NR region, and vice versa. As we demonstrate in Section , these
experiments, using 1 Hz frequency resolution and considering all
miss-located points result in a prediction error for that subject.
possible electrode pairs, in addition to various electrode combina-
An example of such clustering behaviour (shown in only two
tions used in the power ratio group of features, we have Nc = 8468.
dimensions) for the current prediction problem is shown in
The feature extraction process is applied over all epochs from all
where it is seen that the feature vectors corresponding to
subjects. The result of the feature extraction process is a set of Mt
the R and NR subjects indeed lie in distinct (although slightly over-
vectors xi; i ¼ 1; . . ; Mt, each of dimension Nc.
lapping) regions of the feature space. The selection of ‘‘better" fea-
Notice that the majority of these candidate features are statisti-
tures; i.e., features with greater statistical dependence on the
cal characterizations of the measured EEG process and as such at
outcome variable, leads to the formation of tighter clusters with
least partially describe the underlying statistical behaviour of the
smaller variances and with greater separation between the means
EEG signal. Many of these quantities have been used as features
of the clusters of different classes, resulting in improved
in previous related work; e.g., mutual information was used by
and coherences were used by
We normalized feature values to improve performance. Certain
feature values, such as coherence and correlation, are inherently
After extracting candidate features, the second step in the de-
limited to an interval ["1, 1] and so normalization is not required
sign phase is feature reduction, or ‘feature selection' which is critical
in these cases. However, for other feature values, such as e.g., spec-
to the performance of the resulting classifier or predictor. Feature
tral power levels, etc., normalization is desirable. In this study the
selection is an ongoing topic of research in the machine learning
‘‘z-score" normalization method was used. The EEG data of 91 nor-
community. Typically, only a relatively small number of the above
mal (or healthy) adult subjects were measured and the means ll
candidate features bear any significant statistical relationship with
and standard deviations rl, l = 1,. ., Nc for each feature are calcu-
the post-treatment response. We therefore identify those features
lated over the healthy subject sample. Then for schizophrenic sub-
which share the strongest statistical dependencies with the post-
jects, the corresponding l-th feature value xl is replaced with its
treatment-response variable. The result of the feature selection
normalized z-score value zl ¼ xl"ll before being fed to the feature
process is to reduce the number Nc of candidate features to a much
selection and classifier processes.
smaller number Nr of most-relevant features. Our proposed predic-
Because many of the candidate features are highly correlated,
tion procedure uses the ‘‘regularized feature selection" of
there are many possible subsets of features that may be selected
This procedure proceeds in a sequence of Nr steps,
by our proposed feature selection algorithm, resulting in approxi-
where one feature is selected in each step. At each step, the feature
mately equivalent prediction performance. The set of selected fea-
which is selected from the list of (remaining) candidate features is
tures is dependent on the normalization method used, the feature
the one which has the best combination of maximum statistical
selection process, the response criterion and the definition of the
dependence with the treatment-response variable, and minimal sta-
target values y in the training data.
tistical dependence with respect to the set of features already chosen
The next step in the design phase of the prediction process is
in previous steps. In Peng's method, statistical dependence is quan-
the specification of the classifier. The job of the classifier is to input
tified using mutual information. Further details are provided in the
a reduced feature vector x and output the corresponding predicted
reference. The output of the feature selection process is a set of indi-
response value y, which has a discrete value corresponding to
ces that identify which of the Nc candidate features are to be in-
either R or NR. In this way, the classifier output gives us the pre-
cluded in the set of Nr most relevant features. In this study, the
dicted response of the subject to the clozapine therapy.In this
useful range for Nr is between 8 and 14.
study, the classification process was implemented using a kernel-
The feature selection process yields a set of reduced Nr dimen-
ized partial least squares regression (KPLSR) procedure (
sional vectors, xi; i ¼ 1; . . ; Mt. Each of these vectors correspond
The kernel matrix required by the KPLSR meth-
to a point in an Nr - dimensional feature space. Ideally, these points
od was chosen to have a Gaussian structure. The KPLSR method
should cluster into two distinct non-overlapping regions in the fea-
determines a regression function using the available training data
ture space, corresponding to the R and NR groups, respectively. A
that approximates the value 1 over the region of the feature spacecorresponding to non-responders (i.e., the non-responder cluster),
and the value 2 over the responder cluster. (The numerical values 1
The power ratio is calculated via the difference of natural logarithm of PSD values.
and 2 are chosen arbitrarily). In the proposed method, all available
This method is also referred to as the ‘‘minimum-redundancy maximal relevance"
Mp reduced feature vectors corresponding to the epochs available
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Fig. 1. A simplified schematic drawing of the data analysis steps: (a). The design phase, (b). The operational phase, (c). The L1O cross-validation procedure.
for a given subject are fed into the regression function, which out-
is used, where the data from one subject at a time is sequentially
puts values yj; ¼ 1; . . ; Mp. Ideally, these quantities are exactly 1
removed from the training set. The feature selection and classi-
or 2, but in practice, they only approximate these values. The mean
fier design processes are then executed using all remaining data.
of these y- values is evaluated and then quantized to the closest
The resulting machine learning structure is then tested using the
integer 1 or 2, to yield the corresponding NR or R prediction value.
omitted subject. The classifier output is then compared to the
The operational phase is depicted in Once the machine
known response of the subject, and a performance tally is re-
learning prediction process is designed, it may be applied e.g., in an
corded. The process repeats, each time omitting a different sub-
operational mode in a clinical setting, or, in this context, on Group
ject, until all subjects have been omitted once. The overall
B subjects. Here, EEG recordings are taken from the patient, and the
performance figure for the prediction process is then the aggre-
set of reduced features identified in the design phase are computed
gate performance over all iterations (or folds) of the L1O cross-
from the EEG data, to give a sequence of feature vectors
validation process. With this method, we test over all available
xj; j ¼ 1; . . ; Mp. These feature vectors are fed into the classifier
data and in each trial we use the largest possible training set.
or regression function which is specified from the classifier param-
Further, the method is ‘‘fair", since the tested data is not part
eters determined in the design phase. The classifier outputs the
of the training set used in the design phase. The number of latent
predicted response of the subject to the proposed clozapine treat-
variables in the KPLSR approach and the variance parameter
ment, in the manner described above.
associated with the Gaussian kernel are determined using a sim-
In the current situation however, we are interested in evaluat-
ple multi-dimensional grid search optimization within the cross-
ing the performance of the machine learning prediction proce-
validation loop, in a manner consistent with the methodology of
dure resulting from the design phase, using the available
training data. This is the evaluation phase, depicted in
Since in effect a different training set is used in each L1O itera-
In this respect, a leave-one-out (L1O) cross-validation procedure
tion, the set of selected reduced features may vary from one
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Fig. 2. A demonstration of the clustering behaviour of the proposed ML procedure. The Nr ¼ 8 dimensional feature space compressed into 2 dimensions using the KPCA
method. There are nominally 12 data points corresponding to multiple EEG epochs from each subject. The subject index corresponding to each point is indicated on the plot.
iteration to another. In the operational phase discussed above, we
pre-treatment EEG data and then reduced into a set of Nr = 8 most-
need a single set of Nr features that best represents the entire train-
relevant features using the available training set data, as discussed
ing set. We could identify a single set of reduced features simply by
in Section . The prediction performance was then evaluated
applying the feature selection process once on the entire training
using the leave-one-out cross-validation procedure discussed pre-
set. The difficulty with this approach however, is that it is possible
viously. The performance evaluation results using the combined
that the data from a subset of subjects can dominate the feature
EO and EC EEG data sets together for the 23 subjects, for a response
selection process. A convenient method of avoiding this possibility
threshold value d1 ¼ 88:5 and Nr ¼ 8 are summarized in (i),
is, at each L1O iteration, to select a list of k Nr features, where k is a
where it is seen that the overall prediction performance is 87.12%.
constant greater than unity, typically greater than 3 in our exper-
When d1 is reduced to 83.5 corresponding to a 30% responder rate,
iments. Then the desired single set of Nr features is chosen as those
the overall performance becomes 89.7%. Two major latent vari-
which occur most frequently amongst the lists generated over all
ables are used for the kernel PLSR method. These results indicate
L1O iterations. In this way, the features are selected on an equita-
that it is indeed possible to predict the response to clozapine ther-
ble basis from different combinations of the data. To find a proper
apy using the proposed methods. Further experiments were per-
value for k, this procedure is repeated with increasing values of k,
formed using a range of d1 from 83.5 to 92.5; prediction
until at least Nr common features (out of the available k Nr fea-
performance was above 85% in all cases.
tures) can be found among all iterations of the L1O test.
We now present results using data from both subject groups A
For optimal performance of the proposed scheme, the classifier
and B. For this second experiment, we train the classifiers using
must operate in an Nr - dimensional feature space, where in ourexperiments the value of Nr is 8. However, if we wish to visualizethe feature space on a plane, it is necessary to compress the feature
space. It is readily verified that an optimal linear basis for dimen-
sionality compression is the set of principal components of the fea-
ture space, obtained by principal component analysis (PCA). Better
visualization performance can sometimes be obtained through a
nonlinear principal component method, in which case kernelization
techniques () are applied to PCA. We refer to the
nonlinearized version of PCA as kernel PCA (KPCA). In our study,the KPCA method is used only for the purposes of displaying the
clustering results, as in and is not used in the predic-
axis 2, (PC2) -0.2
tion process.
3.1. Treatment-efficacy prediction performance
Fig. 3. Same as , except that all data points belonging to each subject in Fig. 2
The first set of results uses data from Group A which consists of
are averaged to provide one point per subject. The clustering behaviour between
23 subjects. The set of candidate features were extracted from the
the R and NR groups is clearly evident.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Table 3Performance results predicting the response to clozapine therapy in Group A subjects using Nr ¼ 8. Subjects with a post-treatment PANSS score of less than or more than d1 are
considered responders (R) and non-responders (NR), respectively.
(i). d1 ¼ 88:5 (corresponds to 52% response rate)
83.33% = Sensitivity
90.91% = SpecificityAverage = 87.12%
(ii). d1 ¼ 83:5 (corresponds to 30% response rate)
85.7% = Sensitivity
93.75% = SpecificityAverage = 89.7%
only Group A as training data, and then test the prediction perfor-
mance over Group B. A group B responder in this case is defined asa subject having an improvement of at least 25% between the pre-
Our findings support the potential utility of machine learning
and post-QCA scores. The average treatment-efficacy prediction
methods in clinical psychiatry. In the current example we have
performance for this experiment was 85.7% as reflected in
been able to predict, in advance of the first dose, whether a treat-
This shows a satisfactory prediction performance under different
ment–resistant patient will or will not respond to a powerful but
conditions when the classifier is trained on one set, and then tested
potentially toxic medication. In various experiments, we evaluated
on another independent set.
the performance of advanced prediction models in conjunction
We now show an example illustrating the clustering behaviour
with kernelization methods to analyze pre-treatment EEG to pre-
for the proposed scheme, using Group A data. shows a scatter
dict the responsiveness to clozapine. These results support the idea
plot containing 270 points corresponding to the Mt ¼ 270 avail-
that resting EEG data contains embedded salient information re-
able epochs of EEG data from the Group A subjects. This figure
lated to clozapine treatment-outcome that can be extracted using
was generated using the kernel PCA method with a Gaussian ker-
machine learning techniques.
nel. Filled circles correspond to responders and squares to non-
We can provide some further evidence of the validity of the pro-
responders. In this figure, there are nominally 12 points associated
posed prediction method, as follows. First, the clustering behaviour
with each subject; however, there are 2 subjects that have only 10
shown in shows clean separation of the clusters, which is a
or 8 points. The number written beside each point is the corre-
strong indication that the reduced features can indeed discrimi-
sponding subject index, which is assigned arbitrarily. Averaging
nate long-term response. Also, with the L1O cross-validation pro-
the location of all points corresponding to each subject results in
cedure, different test and training samples are used in each
, in which each subject is shown with one point. The cluster-
iteration, and yet overall, a reasonable performance level is at-
ing between the R and NR groups is clearly evident in this figure.
tained. This suggests the proposed machine learning procedure is
The clustering performance shown in this figure is indicative that
consistent across variations of the input data. A final argument to
the proposed machine learning procedure will perform well, as
suggest validity of the proposed method is with regard to the re-
the results of suggest.
sults of . Here, the prediction procedure is trained on GroupA data and tested on a completely independent set of Group B data.
Even though performance degrades somewhat, the resulting per-formance of 85.7% is still quite satisfactory.
3.2. A list of discriminating features
We can further examine the integrity of the proposed prediction
procedure by evaluating the probability that our demonstrated
We show a list of 20 most relevant EEG features of interest in
prediction performance would have been due to chance alone.
These are the features that are most strongly discrimina-tive of response to clozapine. Each of the features listed in the tableis selected at least once over all L1O iterations. is a depiction
of the most-relevant features selected in . A connection be-
A list of discriminating features for treatment-efficacy prediction using pre-treatment
tween two electrode sites in the figure corresponds to a selected
EEG information. Note that the discriminative feature subset is not unique and thereis statistical dependence among them. d
feature which involves those two locations. It roughly indicates
any relations between EEG sensors that convey relevant informa-
Selected EEG-driven Feature
tion for our prediction problem. This figure depicts how the se-
Mutual Information between T3 & P3
lected features could give clues about the locality and
Mutual Information between T3 & O1
interconnection of neurological mechanisms associated with a po-
Mutual Information between C3 & P3
Correlation between F8 & T4
sitive response to clozapine. Further investigation of this matter re-
Coherence at f = 6 Hz between T3 & O1
mains a promising topic for future work.
Coherence at f = 6 Hz between T3 & P3
Coherence at f = 6 Hz between C3 & O1
Coherence at f = 7 Hz between F3 & P3
Coherence at f = 8 Hz between T6 & P3
Coherence at f = 9 Hz between T3 & O1
Independent test performance using subjects in group A as training data (with
Coherence at f = 10 Hz between T3 & T5
d1 ¼ 88:5 and Nr ¼ 8), and group B as test subjects. Response to clozapine therapy is
Coherence at f = 10 Hz between T3 & P3
defined as more than a 25% improvement in the QCA score. Subjects with a post-
Left to right PSD-ratio at f = 10 Hz, T5/T6
treatment QCA score of less than or more than d1 are considered responders (R) and
Left to right PSD-ratio at f = 11 Hz, T5/T6
non-responders (NR), respectively.
Coherence at f = 11 Hz between C3 & P3
Coherence at f = 11 Hz between T3 & P3
Left to right PSD-ratio at f = 12 Hz, T5/T6
85.7% = Sensitivity
Coherence at f = 12 Hz between T3 & T5
85.7% = Specificity
Coherence at f = 13 Hz between F7 & F3
Left to right PSD-ratio at f = 16 Hz, T5/T6
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
not being constrained by the theoretical constructs derived fromprevious studies. Without devaluing previous work, or discountingthe importance of replication, limiting feature selection to only agroup made up of those reported to be useful in previous studiesdecreases the probability that new and highly salient features willbe discovered. Also we have not employed traditional EEG fre-quency bands and instead used frequency components individuallywithin a 1 Hz resolution window. This maximizes the possibility ofdetecting potentially important EEG features that might otherwisebe obscured when power is integrated over a broad range of fre-quencies in a given band, e.g. a 10 Hz signal might be lost in the8–12 Hz alpha band.
The goal of this paper is to propose a new clinical data analysis
method and derive an empirical set of EEG features predictive ofresponse to clozapine, not to derive neurological information
Fig. 4. A rough schematic drawing which shows a list of some relevant features by
regarding the pathophysiology of schizophrenia. Nevertheless the
connections, as reflected in Connections are shown by solid thick lines.
clustering of relevant EEG features in the temporo-parietal area
Electrodes A1 and A2 represent the linked ears reference.
of the dominant hemisphere, as seen in and in ,may be of some interest to those studying regional brain activitypatterns in patients with schizophrenia. Others have described
With reference to (i), there are 12 responders and 11 non-
bilateral reduced grey matter volume in the temporal lobes (e.g.,
responders, so the probability p of a responder may be taken as
and electrophysiological abnormalities in
12/23 = 0.5212. Assuming all subjects are independent, the proba-
the left temporo-parietal region on EEG (e.g., ) in
bility of a prediction error is governed by a binomial distribution,
which is parameterized by N, the number of samples, and p, in this
This retrospective study suffers from some weaknesses. Most
case the probability of a responder. Therefore, the probability of
notably our QCA clinical rating is based on chart review and there-
this level of performance (10 classifications as R and 2 as NR out
fore likely to be less accurate than a standardized PANSS. However,
of N = 12 true responders) occurring due to chance alone is evalu-
our raters were clinicians expert in the treatment of schizophrenia
ated from the binomial distribution as 0.0226. Similarly, the value
and familiar with the subjects being evaluated. The QCA would
of p for the non-responder case is 0.4783, so the probability of esti-
therefore have reasonable clinical validity. The high predictive
mating 10 NR and 1 R out of 11 non-responders due to chance
accuracy of our algorithm in both Group A and B subjects even in
alone is 0.0036. Similarly, for the case of (ii), the corre-
the face of this source of outcome variance may speak to the
sponding figures are 0.0039 and 0.0211 for the R and NR groups,
robustness of this methodology. As QCA and PANSS ratings were
respectively. Thus we see that these figures are negligibly small
completed years before this project they could not have been influ-
and we can conclude the prediction results are almost certainly a
enced by the machine learning assignment into responder and
consequence of the distinguishing characteristics of the EEG mea-
surements obtained from the two groups.
It must be noted the results of this pilot study are derived using
By employing more advanced analytical models, the present
a relatively small quantity of data. Our findings must be replicated
study was designed to extend and improve upon the utility of
in a much larger sample of training and test subjects before they
the EEG in predicting the responsiveness to clozapine as investi-
can be accepted with confidence. Notwithstanding these issues,
gated in other studies. Although found that
our data suggest that machine learning methods of analyzing
changes in EEG features correlated with outcome, post-treatment
EEG signal may be employed to create a useful psychiatric manage-
EEG data was required. Our methodology is more potentially useful
ment tool. Furthermore, the methodology described in this paper
to the clinician as prediction is possible using EEG data collected
could be extended to construct models that predict the response
before this potentially toxic treatment is initiated. Further, even
to various other treatments available for patients with schizophre-
though were successful at identifying features
nia or with other psychiatric conditions. Finally, it may be possible
which were indicative of response, they did not incorporate their
to incorporate a range of other clinical and laboratory data beyond
findings into a quantitative prediction algorithm. We have there-
EEG measurements, such as personality inventory scores, personal
fore been able to extend their work by accomplishing this purpose.
and demographic information and treatment history to improve
Our proposed feature selection method is novel in the respect
clustering behaviour and prediction performance.
that a small number of maximally discriminative features are auto-
An additional topic for future consideration is to investigate the
matically identified from a very large list of candidate features. This
minimum number of channels needed to yield adequate prediction
is in contrast to the previous approaches, which inherently require
performance. It may be that a reduced configuration of electrodes
a trial-and-error procedure. The previous approach consists of
concentrated over the left side (as suggested by will still
hypothesizing that a single feature may be discriminative, and then
yield an acceptable level of performance, but at a reduced cost.
verifying or rejecting the hypothesis by experiment. Thus ourmethod can identify salient features that could easily be missedusing previous methods.
5. Appendix A. The QCA clinical rating procedure
It is gratifying to note that our proposed feature selection pro-
cedure did select some features that were identified from previous
The QCA clinical rating procedure was devised in the context of
studies. This serves as a verification of our method and provides a
an un-related earlier naturalistic retrospective un-published clini-
useful connection with the previous research. Nevertheless, the
cal study of treatment-resistant schizophrenic patients being con-
mathematical structure produced by our ML methods was created
sidered for clozapine treatment. The subjects in the present study
from the training data alone without an a priori model or previous
were included in this previous study. An experienced clinician re-
research findings (e.g. regarding QEEG differences between
viewed all the available clinical descriptive information of the pa-
responders and non-responders). As such it has the advantage of
tient's symptomatology prior to beginning a course of clozapine.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006
Reported symptoms, corresponding to those described in the
Guo Y, Bowman FD, Kilts C. Predicting the brain response to treatment using a
PANSS, were rated as: present, moderate or severe on a one to
Bayesian hierarchical model with application to a study of schizophrenia. Hum.
Brain Mapp. 2008;29:1092–109.
six point scale. Only explicitly described symptoms were scored
Hughes JR, John ER. Conventional and Quantitative Electroencephalography in
and the clinical rater was instructed not to infer the presence of po-
Psychiatry. J. Neuropsychiatry Clin. Neurosc. 1999;11:190–208.
tential symptoms. The same rating was repeated, based on case re-
Ince N F, Goksu F, Pellizzer G, Tewfik A, Stephane M. Selection of spectro-temporal
patterns in multichannel MEG with support vector machines for schizophrenia
cords describing current symptoms at the time (usually after
classification. Proc. Annual Int. Conf. IEEE Eng. in Medicine and Biology Society
approximately six months) when the decision was made to either
2008; 3554–3557.
discontinue or continue with on-going maintenance clozapine
Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for
schizophrenia. Schizophr. Bull. 1987;13:261–76.
Kane J, Honigfeld G, Singer J. Meltzer H and the Clozaril Collaborative Study Group.
Clozapine for the treatment-resistant schizophrenic: A double-blindcomparison
Kim D, Burge J, Lane T, Pearlson GD, Kiehl KA, Calhoun VD. Hybrid ICA-Bayesian
network approach reveals distinct effective connectivity differences in
The authors would like to thank Margarita Criollo, Joy Fournier,
schizophrenia. Neuroimage 2008;42:1560–8.
and Eleanor Bard for their help in clinical experiments. This work
Knott V, Labelle A, Jones B, Mahoney C. EEG hemispheric asymmetry as a predictor
was supported by the Natural Sciences and Engineering Research
and correlate of short-term response to clozapine treatment in schizophrenia.
Clin. Electroencephalogr. 2000;31:145–52.
Council of Canada (NSERC).
Knott V, Labelle A, Jones B, Mahoney C. Quantitative EEG in schizophrenia and in
response to acute and chronic clozapine treatment. Schizophr. Res.
2001;50:41–53.
Knott VJ, LaBelle A, Jones B, Mahoney C. EEG coherence following acute and chronic
Adler G, Grieshaber S, Faude V, Thebaldi B, Dressing H. Clozapine in patients with
Kwak N, Choi C-H. Input feature selection by mutual information based on Parzen
chronic schizophrenia: serum level, EEG and memory performance.
Birca A, Carmant L, Lortie A, Lassonde M. Interaction between the flash evoked
Leucht S, Kane JM, Kissling W, Hamann J, Etschel E, Engel RR. What does the PANSS
SSVEPs and the spontaneous EEG activity in children and adults. Clin.
mean? Schizophr. Res. 2005;79:231–8.
Lin C, Wang Y, Chen J, Liou Y, Bai Y, Lai I, Chen T, Chiu H, Li Y. Artificial neural
Boutros NN, Arfken C, Galderisi S, Warrick J, Pratt G, Iacono W. The status of spectral
network prediction of clozapine response with combined pharmacogenetic and
EEG abnormality as a diagnostic test for schizophrenia. Schizophr. Res.
clinical data. Comput. Methods Programs Biomed. 2008;91:91–9.
Malow BA, Reese KB, Sato S, Bogard PJ, Malhotra AK, Tung-Ping S, Pickar D.
Coburn KL, Lauterbach EC, Boutros NN, Black KJ, Arciniegas DB, Coffey CE. The Value
Spectrum of EEG abnormalities during clozapine treatment. Electroencephalogr.
of Quantitative Electroencephalography in Clinical Psychiatry: A Report by the
Clin. Neurophysiol. 1994;91:205–11.
Committee on Research of the American Neuropsychiatric Association. J.
Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to kernel-based
Neuropsychiatry Clin. Neurosci. 2006;18:460–500.
learning algorithms. IEEE Trans Neural Networks 2001;12:181–201.
Cover TM, Thomas, JA. Elements of Information Theory, 2nd Ed. John Wiley & Sons,
Oikonomou T, Sakkalis V, Tollis IG, Micheloyannis S. Searching and visualizing brain
networks in Schizophrenia. Springer Lecture Notes in Computer Science.
Dunki RM, Dressel M. Statistics of biophysical signal characteristics and state
Biological and Medical Data Analysis 2006;4345:172–82.
specificity of the human EEG. Physica A 2006;370:632–50.
Okugawa G, Sedvall GC, Agartz I. Reduced grey and white matter volumes in the
Essali A, Haj-Hasan NA, Li C, Rathbone J. Clozapine versus typical neuroleptic
temporal lobe of male patients with chronic schizophrenia. Eur. Arch.
medication for schizophrenia. Cochrane Database of Systematic Reviews 2009;
Psychiatry Clin. Neurosci. 2002;252:120–3.
John Wiley and Sons Ltd, Art No.CD000059.
Peng H, Long F, Ding C. Feature selection based on mutual information: Criteria of
Fan Y, Shen D, Gur RC, Gur RE, Davatzikos C. COMPARE: classification of
max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern
morphological patterns using adaptive regional elements. IEEE Trans. Medi.
Analysis and Machine Intelligence 2005;27:1226–38.
Rosipal R, Kramer N. Overview and recent advances in partial least squares. In:
Faux SF, Shenton ME, McCarley RW, Torello MW, Duffy FH. P200 topographic
Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, latent
alterations in schizophrenia: evidence for left temporal-centroparietal region
structure and feature selection techniques. Lecture Notes in Computer
deficits. Electroencephalogr. Clin. Neurophysiol. Suppl. 1987;40:681–7.
Science: Springer; 2006. p. 34–51.
Freudenreich O, Weiner RD, McEvoy JP. Clozapine-induced electroencephalogram
Sakkalis V, Oikonomou T, Pachou E, Tollis I, Micheloyannis S, Zervakis M. Time-
changes as a function of clozapine serum levels. Biol. Psychiatry
significant wavelet coherence for the evaluation of Schizophrenic brain activity
using a graph theory approach. Proceedings Int Conference of the IEEE
Gallinat J, Heinz A. Combination of multimodal imaging and molecular genetic
Engineering in Medicine and Biology 2006:4265–8.
information to investigate complex psychiatric disorders. Pharmacopsychiatry
Struyf J, Dobrin S, Page D. Combining gene expression, demographic and clinical
data in modeling disease: a case study of bipolar disorder and schizophrenia.
Gross A, Joutsiniemi SL, Rimon R, Appelberg B. Clozapine-induced QEEG changes
BMC Genomics 2008; 9:(531).
correlate with clinical response in schizophrenic patients: a prospective,
Varma, S, R. Simon, Bias in error estimation when using cross-validation for model
longitudinal study. Pharmacopsychiatry 2004;37:119–22.
selection, BMC Bioinformatics, 2006; 7:(91).
Gunther W, Baghai T, Naber D, Spatz R, Hippius H. EEG alterations and seizures
Young CR. Bowers Jr. MB, Mazure CM. Management of the adverse effects of
during treatment with clozapine: a retrospective study of 283 patients.
clozapine. Schizophrenia Bulletin 1998;24:381–90.
Source: http://recherchesantementale.qc.ca/wp-content/uploads/2012/11/eeg2010.pdf
Beiträge des Instituts für Umweltsystemforschung der Universität Osnabrück Herausgeber: Prof. Dr. Michael Matthies Beitrag Nr. 50 Chemical Fate of Sulfadiazine in Soil: Mechanisms and Modelling Approaches Christiane Zarfl November 2008 ISSN Nr. 1433-3805 Prof. Dr. Michael Matthies Universität Osnabrück Institut für Umweltsystemforschung Barbarastr. 12
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 51, NO. 2, FEBRUARY 2004 Nonlinear Analysis of the Separate Contributions of Autonomic Nervous Systems to Heart Rate Variability Using Principal Dynamic Modes Yuru Zhong, Hengliang Wang, Ki Hwan Ju, Kung-Ming Jan, and Ki H. Chon*, Member, IEEE Abstract—This paper introduces a modified principal dynamic