A pilot study to determine whether machine learning methodologies using pre-treatment electroencephalography can predict the symptomatic response to clozapine therapy

Contents lists available at Clinical Neurophysiology A pilot study to determine whether machine learning methodologies usingpre-treatment electroencephalography can predict the symptomatic responseto clozapine therapy Ahmad Khodayari-Rostamabad , Gary M. Hasey ,Duncan J. MacCrimmon ,, James P. Reilly Hubert de Bruin a Electrical and Computer Eng. Dept., McMaster University, Hamilton, ON, Canada L8S 4K1b Dept. of Psychiatry and Behavioral Neurosciences, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada L8S 4L8c Mood Disorders Program, Centre for Mountain Health Services, St. Joseph Hospital, Hamilton, ON, Canada L8N 3K7d School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada L8S 4K1 Objective: To investigate whether applying advanced machine learning (ML) methodologies to pre-treat- Accepted 11 May 2010 ment electroencephalography (EEG) data can predict the response to clozapine therapy in adult subjects Available online 17 June 2010 suffering from chronic schizophrenia.
Methods: Pre-treatment EEG data are collected in 23 + 14 schizophrenic adults. Treatment outcome, after at least one year follow-up, is determined using clinical ratings by a trained clinician blind to EEG results.
First, a feature selection scheme is employed to select a reduced subset of features extracted from the subjects' EEG that is most statistically relevant to our treatment-response prediction. These features are then entered into a classifier, which is realized in the form of a kernel partial least squares regression method that performs response prediction. Various scales, including the positive and negative syndrome scale (PANSS) are used as treatment-response indicators.
Results: We determined that a set of discriminating EEG features do exist. A low-dimensional represen-tation of the feature space showed significant clustering into clozapine responder and non-respondergroups. The minimum level of performance of the proposed prediction methodology, tested over a rangeof conditions using the leave-one-out cross-validation method using the original 23 subjects, with furthertesting in an independent sample of 14 subjects, was 85%.
Conclusions: These findings indicate that analysis of pre-treatment EEG data can predict the clinicalresponse to clozapine in treatment resistant schizophrenia.
Significance: If replicated in a larger population, this novel approach to EEG analysis may assist the clini-cian in determining treatment-efficacy.
! 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights logical side effects can be life threatening, blood samples to mon-itor the white blood cell count must be collected as long as the Compared with other antipsychotic medications the atypical drug is used, at weekly to monthly intervals. The logistic difficul- antipsychotic medication clozapine is recognized to have superior ties for the patient and the treatment team are substantial. A meth- therapeutic effectiveness in the treatment of chronic medication- od that could reliably determine, before the onset of therapy, resistant schizophrenia (e.g., ). However, cloza- whether a given patient will or will not respond to clozapine would pine may produce serious side effects such as seizures, cardiac greatly assist the clinician in determining whether the risks and lo- arrhythmias or bone marrow suppression with neutropenia gistic complexity of clozapine are outweighed by the potential (According to a recent Cochrane review, about 34% of treatment-resistant patients respond to clozapine while Quantitative electroencephalography (QEEG or EEG) may offer 3.2% develop blood problems (). As the hemato- some promise in this regard. EEG abnormalities in schizophrenicsubjects and EEG changes due to clozapine therapy have beenthe focus of a number of clinical studies (see e.g., * Corresponding author. Tel.: +1 905 525 9140x22895; fax: +1 905 521 2922.
E-mail address: (J.P. Reilly).
1388-2457/$36.00 ! 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
doi: A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 laboratories. Therefore, predictive algorithms dependent on EEG measurements are more practical. Furthermore, since the required EEG data is acquired during the resting state, only minimal cooper- Based on findings in 17 schizophrenic subjects, ation is required from the patient. Thus, an EEG based method of found that the clozapine-induced improvement of psycho- predicting treatment response would have many advantages over pathology symptom ratings using the Positive and Negative Syn- imaging methods such as MRI, PET or MEG.
drome Scale (PANSS) was correlated with pre-treatment QEEG The goal of the present pilot study is to examine the utility of inter and intra-hemispheric spectral power asymmetry. Greater machine learning (ML) methods for processing EEG signals to pre- pre-treatment anterior to posterior asymmetry in the delta fre- dict the response of schizophrenic subjects to clozapine.
quency range was associated with greater improvement in nega-tive symptoms while greater pre-treatment anterior to posteriortheta asymmetry predicted improvement of positive symptoms and global improvement. Larger inter-hemispheric asymmetry inthe theta and beta frequencies in the central and anterior temporal 2.1. Quantitative EEG recordings regions were, respectively, predictive of greater improvement inpositive and negative symptoms. also found that We collected pre-clozapine resting EEG data from chronically changes in the theta frequency in QEEG with clozapine treatment, ill, treatment-resistant schizophrenic subjects prior to beginning particularly in the midline electrodes over the fronto-central scalp clozapine therapy. The data were collected without change to the area, were a more sensitive indicator for the evaluation of cloza- patient's current medication regimen. EEG was recorded with the pine treatment efficacy than the serum clozapine level. Though patient in a semi-recumbent position in a sound attenuated, elec- these methods reveal important relationships between QEEG vari- trically shielded room by an experienced technician who prompted ables and clinical outcome, a series of simple correlational analyses patients on signs of drowsiness. Sessions were arranged in the do not readily yield a ‘‘responder" or ‘‘non-responder" dichoto- mornings and patients were requested to avoid coffee, drugs, alco- mous categorization for an individual patient.
hol and smoking immediately prior to the recording. A maximum The above analyses employed standard statistical methods. On of ten and a half minutes of eyes-closed (EC) and of eyes-open the other hand, a more mathematically sophisticated analysis (EO) data respectively were collected in up to three separate including pattern recognition and dimensionality reduction meth- 3.5 min runs using a QSI-9500 system, giving a total of 3 EO and ods (which together may be categorized as machine learning tech- 3 EC files. Electrodes were placed in the 10/20 configuration refer- niques) can perform a more comprehensive data analysis. Machine enced to linked ears with impedances below 5 kX. The signals learning techniques are finding increasing application in psychia- were band pass filtered between [0.5 and 80 Hz] and notch filtered try, particularly when multi-dimensional, noisy, highly complex at 60 Hz by the QSI system during the recording. Data were digi- data or multi-modal data sets are analyzed together, (see e.g., tized at a rate of 204.8 Hz. Since our selected features were either ). For example, support vector machine (SVM) intra- or inter-hemispherical, we discarded the data from the mid- techniques that select spectro-temporal patterns from multichan- line electrodes (FZ, CZ, PZ, and OZ) in the interests of saving com- nel magnetoencephalogram (MEG) data collected during a verbal putational resources. The 16 remaining EEG electrodes used in our working memory task have been used to distinguish schizophrenic study were Fp1, Fp2, F3, F4, F7, F8, T3, T4, C3, C4, T5, T6, P3, P4, O1 from control subjects ). Machine learning algo- rithms using structural brain magnetic resonance (MRI) images For de-artifacting, the data were partitioned into segments of 1 ), functional MRI (fMRI) data s duration. If the input signal on any electrode saturated the acqui- ) and combined genomic and clinical data sition hardware at approximately plus or minus 160 lv, the entire have been employed to separate schizophrenic, bipolar segment was rejected. The signals were then digitally bandpass fil- and healthy control subjects.
tered after recording between 4 and 42 Hz to partially mitigate the Machine learning approaches have also been applied to predic- effects of eye movement and muscle artifacts. For each EEG file, the tion of clozapine treatment-efficacy. describes a first 60 segments of the de-artifacted part of the 3.5 min of data study in which a feed-forward multilayer perceptron network were used, since several segments were heavily artifacted, leaving (with a back-propagation error training technique) is employed only this number of segments that were uncorrupted on all elec- using clinical and pharmacogenetic data to predict clozapine re- trodes. The selected data in each of the three files for both the sponse in schizophrenic subjects. Five pharmacogenetic variables EO and EC cases were divided into 2 epochs of 40 s duration with and five clinical variables (including gender, age, height, baseline 50% overlap, to give a nominal 12 epochs per subject. These epochs body weight, and baseline body mass index) were collated from were used to extract statistical quantities (such as absolute pow- 93 schizophrenic subjects taking clozapine, including 26 respond- ers, power spectral densities, coherences, etc.) that became the ers. Using this method, they obtained an overall prediction accu- candidate features as described below. When estimating these sta- racy rate of 83.3%.
tistical quantities, each epoch was divided into overlapping 1 s describes a Bayesian hierarchical model using windows with 60% overlap between adjacent windows. The pre-treatment fMRI and positron emission tomography (PET) infor- respective statistical quantity was then calculated over each win- mation coupled with patient characteristics (e.g. medical or family dow and the desired result obtained by averaging over all win- history and genotype) as training data to predict changes in brain dows. In the experimental results which follow, all EO and EC activity in 16 schizophrenic subjects following treatment with two epochs were combined, to make maximum use of the available atypical antipsychotics (risperidone or olanzapine). The authors postulated that predicting drug-induced changes in brain activitywould assist the clinician in determining optimal drug choice.
2.2. Description of subjects and the clinical assessment procedures However, the clinical utility of these previous approaches is negatively impacted by the expense and unavailability of complex Subjects, comprising both in-patients and out-patients, were re- methods such as fMRI, PET, genetic screening and MEG. In contrast, cruited from the schizophrenia program at St. Joseph's Hospital, electroencephalography (EEG) is an inexpensive, non-invasive Centre for Mountain Health Services, Hamilton, Ontario. All sub- technique widely available in smaller hospitals and in community jects met both DSM-IV criteria for schizophrenia and the A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Table 1Demographic information of the 23 subjects (denoted Group A) who participated in the study. The lower 4 items in the table are scales related to the PANSS clinical rating score.
Age at start of treatment [years] Average = 41.2, std = 8.4, min = 28.8, max = 57 Educational Level Average = 3.1, std = 1.4, min = 2, max = 7 Age at symptom onset [years] Average = 21.2, std = 5, min = 14, max = 32 Total # of hospitalizations (Pre-clozapine) Average = 9.7, std = 13, min = 0, max = 63 Duration total of hospitalization (Pre-clozapine) [days] Average = 615.7, std = 928, min = 0, max = 3789 Chlorpromazine equivalents (Pre-clozapine) [mg/day] Average = 726.6, std = 636, min = 40, max = 2485 Clozapine dose [mg/day] Average = 344.6, std = 157, min = 50, max = 600 Post-treatment Positive Symptoms Scale Average = 17.8, std = 3.4, min = 11, max = 24 Post-treatment Negative Symptoms Scale Average = 23, std = 3.9, min = 12, max = 32 Post-treatment General Symptoms Rank (GR) Average = 46.3, std = 5.7, min = 32, max = 56 Post-treatment Total Rank (PSS + NSS + GR) Average = 87.2, std = 10.9, min = 58, max = 101 1 Education level rating: 1: grade 6 or less, 2: grade 7 to 12 without graduating, 3: graduated high school, 4: part college, 5: graduate 2 years college, 6: graduate 4 years college, 7: part graduated/professional school, 8: completed graduated professional school.
criteria for treatment resistance. Patients meeting conducted a hypothesis test on the means, assuming the QCA data these criteria may be considered to be ‘‘severely symptomatic", points are independent and normally distributed, and that the vari- i.e., as suffering acutely from schizophrenia. All subjects gave in- ances of the R and NR groups are identical. It is straightforward to formed consent.
show that the respective likelihood ratio is F-distributed. In this Data from two groups of schizophrenic subjects were used in case, df = 10, 11 for the numerator and denominator, respectively, this retrospective study. The first group (Group A) consists of 23 with F = 1.1056 and p = 0.43. Thus, there is no evidence to suggest subjects. Group B is an independent sample of 14 subjects. Available the pre-treatment QCA means of the two groups are significantly socio-demographic and clinical information for Groups A and B are shown in . Symptom severity after clozapine treat- Group B subjects are defined as responders to clozapine therapy ment is measured in Group A using the positive and negative syn- if there is an improvement of at least 25% between the pre- and drome scale (PANSS) score ). PANSS evaluations post-QCA scores. This level of relative change represents a clini- are not available for Group B subjects. As PANSS scores were not cally significant improvement in symptom severity considering available for Group A subjects prior to clozapine treatment, pre- the fact that all the subjects in our study were in the treatment- treatment symptom severity was assessed through a quantitative resistant population See e.g., clinical assessment (QCA) conducted by review of the clinical record who used a 20% relative change as response indicator.
guided by the structure of the PANSS. The QCA procedure is outlinedin Appendix A. As all QCA ratings were completed before initiationof this study raters were blind to the machine learning outcome 2.3. Overview of the machine learning process predictions. QCA was used to assess psychopathology both preand post clozapine treatment in Group B.
We now present a brief overview of the machine learning pro- We now discuss how we determine whether a patient is a re- cess used for prediction of clozapine response. A necessary compo- sponder (R) or non-responder (NR). In this retrospective pilot study nent of this process is the collection of a training set. In our case, quantifying clinical response is complicated by the absence of pre- the training set consists of Mp EEG epochs from each of M subjects, treatment PANSS scores. We were therefore obliged to define re- for a total of Mt epochs altogether. In our , M = 23, sponse on the basis of a single post-treatment PANSS score. To Mp = 12 and Mt = 270. The training set also includes the set of re- do this we created post-treatment PANSS thresholds d1 to as- sponse outcomes yi ,i = 1,. .,Mt corresponding to each epoch; i.e., if sess response: first we rank-ordered all subjects by post-treatment the subject corresponding to the ith EEG epoch is a responder PANSS score then chose a value of d1 (88.5) such that our 23 subjects (non-responder), then the value of yi is R (NR), determined by the re- were divided into responder (R) and non-responder (NR) classes sponse criterion discussed previously.
with roughly equal number of subjects (R = 12, NR = 11).
There are three phases in a machine learning procedure. These Having R and NR groups of similar size has advantages with re- are the design, operational and evaluation phases, as outlined in spect to the machine learning process; however, this assumes that The design phase, which consists of the feature extraction, clinically significant improvement is seen in about 50% of those feature selection and classification components, is now described.
treated with clozapine. Others have reported that, on average, only The Design Process is depicted in (a). The first step is to ex- 34% of treatment-resistant schizophrenic patients will respond to tract candidate features from each epoch of pre-treatment EEG clozapine. For this reason we also reanalyzed our data using a va- data. In our study, these features are statistical quantities including lue d1 ¼ 83:5 which yields a 30% response rate (i.e. with 7 R and 16 coherencbetween all electrode pairs at various frequencies, corre- NR subjects in group A).
lation and cross-correlation coefficients, mutual information be- We must confirm that the pre-treatment QCA means of the R tween all sensor pairs ), absolute and and NR subgroups of group A subjects are not significantly differ- relative power levels at various frequenciesthe left-to-right hemi- ent, so that the post-treatment PANSS rating alone accurately indi-cates the effect of the treatment on the subject. To this end, we 2 The total number of epochs is nominally 12 x 23 = 276. However, there are only 8 and 10 available epochs for 2 of the subjects, leaving only 270 net epochs.
3 We calculated the magnitude squared coherence estimate using the averaged 1 Using the PANSS data, the ‘total rank' (TR) score is used as the clinical assessment periodogram method of Welch by the MathWorks MATLAB software, ver. 7.1. See in our experiments. TR is the sum of three scales in PANSS: 1. general rank, (GR), 2.
positive (or productive) symptoms scale, (PSS), 3. negative (or deficit) symptoms 4 Using power spectral density (PSD) estimate via Welchs' averaged modified scale, (NSS). This means that TR = GR + PSS + NSS.
periodogram method in MATLAB.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Table 2Available demographic information of the 14 subjects denoted by Group B.
Age at start of treatment [years] Average = 35.7, std = 10, min = 22, max = 55.5 Educational level Average = 3.3, std = 1.64, min = 2, max = 7 Age at symptom onset [years] Average = 21.3, std = 5.28, min = 15, max = 31 Total # of hospitalizations (Pre-clozapine) Average = 6.43, std = 6.9, min = 0, max = 18 Duration total of hospitalization (Pre-clozapine) [days] Average = 470.8, std = 627, min = 0, max = 1879 Chlorpromazine equivalents (Pre-clozapine) [mg/day] Average = 628, std = 404, min = 40, max = 1169 Clozapine dose [mg/day] Average = 396.4, std = 101, min = 200, max = 500 1 See Table 1 for definition.
sphere power ratio, the anterior/posterior power gradient across feature vector xi maps into the R region if the subject correspond- many frequencies and between electrodes (calculated using loga- ing to the i-th epoch is a responder, and into the NR region other- rithm difference of power spectral density values). These quantities wise. In practice however, the clusters overlap somewhat, so that can all be readily calculated from the measured EEG signal. The feature vectors from a few epochs of the R subjects map into the number Nc of such candidate features can be quite large. In our NR region, and vice versa. As we demonstrate in Section , these experiments, using 1 Hz frequency resolution and considering all miss-located points result in a prediction error for that subject.
possible electrode pairs, in addition to various electrode combina- An example of such clustering behaviour (shown in only two tions used in the power ratio group of features, we have Nc = 8468.
dimensions) for the current prediction problem is shown in The feature extraction process is applied over all epochs from all where it is seen that the feature vectors corresponding to subjects. The result of the feature extraction process is a set of Mt the R and NR subjects indeed lie in distinct (although slightly over- vectors xi; i ¼ 1; . . ; Mt, each of dimension Nc.
lapping) regions of the feature space. The selection of ‘‘better" fea- Notice that the majority of these candidate features are statisti- tures; i.e., features with greater statistical dependence on the cal characterizations of the measured EEG process and as such at outcome variable, leads to the formation of tighter clusters with least partially describe the underlying statistical behaviour of the smaller variances and with greater separation between the means EEG signal. Many of these quantities have been used as features of the clusters of different classes, resulting in improved in previous related work; e.g., mutual information was used by and coherences were used by We normalized feature values to improve performance. Certain feature values, such as coherence and correlation, are inherently After extracting candidate features, the second step in the de- limited to an interval ["1, 1] and so normalization is not required sign phase is feature reduction, or ‘feature selection' which is critical in these cases. However, for other feature values, such as e.g., spec- to the performance of the resulting classifier or predictor. Feature tral power levels, etc., normalization is desirable. In this study the selection is an ongoing topic of research in the machine learning ‘‘z-score" normalization method was used. The EEG data of 91 nor- community. Typically, only a relatively small number of the above mal (or healthy) adult subjects were measured and the means ll candidate features bear any significant statistical relationship with and standard deviations rl, l = 1,. ., Nc for each feature are calcu- the post-treatment response. We therefore identify those features lated over the healthy subject sample. Then for schizophrenic sub- which share the strongest statistical dependencies with the post- jects, the corresponding l-th feature value xl is replaced with its treatment-response variable. The result of the feature selection normalized z-score value zl ¼ xl"ll before being fed to the feature process is to reduce the number Nc of candidate features to a much selection and classifier processes.
smaller number Nr of most-relevant features. Our proposed predic- Because many of the candidate features are highly correlated, tion procedure uses the ‘‘regularized feature selection" of there are many possible subsets of features that may be selected This procedure proceeds in a sequence of Nr steps, by our proposed feature selection algorithm, resulting in approxi- where one feature is selected in each step. At each step, the feature mately equivalent prediction performance. The set of selected fea- which is selected from the list of (remaining) candidate features is tures is dependent on the normalization method used, the feature the one which has the best combination of maximum statistical selection process, the response criterion and the definition of the dependence with the treatment-response variable, and minimal sta- target values y in the training data.
tistical dependence with respect to the set of features already chosen The next step in the design phase of the prediction process is in previous steps. In Peng's method, statistical dependence is quan- the specification of the classifier. The job of the classifier is to input tified using mutual information. Further details are provided in the a reduced feature vector x and output the corresponding predicted reference. The output of the feature selection process is a set of indi- response value y, which has a discrete value corresponding to ces that identify which of the Nc candidate features are to be in- either R or NR. In this way, the classifier output gives us the pre- cluded in the set of Nr most relevant features. In this study, the dicted response of the subject to the clozapine therapy.In this useful range for Nr is between 8 and 14.
study, the classification process was implemented using a kernel- The feature selection process yields a set of reduced Nr dimen- ized partial least squares regression (KPLSR) procedure ( sional vectors, xi; i ¼ 1; . . ; Mt. Each of these vectors correspond The kernel matrix required by the KPLSR meth- to a point in an Nr - dimensional feature space. Ideally, these points od was chosen to have a Gaussian structure. The KPLSR method should cluster into two distinct non-overlapping regions in the fea- determines a regression function using the available training data ture space, corresponding to the R and NR groups, respectively. A that approximates the value 1 over the region of the feature spacecorresponding to non-responders (i.e., the non-responder cluster), and the value 2 over the responder cluster. (The numerical values 1 The power ratio is calculated via the difference of natural logarithm of PSD values.
and 2 are chosen arbitrarily). In the proposed method, all available This method is also referred to as the ‘‘minimum-redundancy maximal relevance" Mp reduced feature vectors corresponding to the epochs available

A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Fig. 1. A simplified schematic drawing of the data analysis steps: (a). The design phase, (b). The operational phase, (c). The L1O cross-validation procedure.
for a given subject are fed into the regression function, which out- is used, where the data from one subject at a time is sequentially puts values yj; ¼ 1; . . ; Mp. Ideally, these quantities are exactly 1 removed from the training set. The feature selection and classi- or 2, but in practice, they only approximate these values. The mean fier design processes are then executed using all remaining data.
of these y- values is evaluated and then quantized to the closest The resulting machine learning structure is then tested using the integer 1 or 2, to yield the corresponding NR or R prediction value.
omitted subject. The classifier output is then compared to the The operational phase is depicted in Once the machine known response of the subject, and a performance tally is re- learning prediction process is designed, it may be applied e.g., in an corded. The process repeats, each time omitting a different sub- operational mode in a clinical setting, or, in this context, on Group ject, until all subjects have been omitted once. The overall B subjects. Here, EEG recordings are taken from the patient, and the performance figure for the prediction process is then the aggre- set of reduced features identified in the design phase are computed gate performance over all iterations (or folds) of the L1O cross- from the EEG data, to give a sequence of feature vectors validation process. With this method, we test over all available xj; j ¼ 1; . . ; Mp. These feature vectors are fed into the classifier data and in each trial we use the largest possible training set.
or regression function which is specified from the classifier param- Further, the method is ‘‘fair", since the tested data is not part eters determined in the design phase. The classifier outputs the of the training set used in the design phase. The number of latent predicted response of the subject to the proposed clozapine treat- variables in the KPLSR approach and the variance parameter ment, in the manner described above.
associated with the Gaussian kernel are determined using a sim- In the current situation however, we are interested in evaluat- ple multi-dimensional grid search optimization within the cross- ing the performance of the machine learning prediction proce- validation loop, in a manner consistent with the methodology of dure resulting from the design phase, using the available training data. This is the evaluation phase, depicted in Since in effect a different training set is used in each L1O itera- In this respect, a leave-one-out (L1O) cross-validation procedure tion, the set of selected reduced features may vary from one A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Fig. 2. A demonstration of the clustering behaviour of the proposed ML procedure. The Nr ¼ 8 dimensional feature space compressed into 2 dimensions using the KPCA method. There are nominally 12 data points corresponding to multiple EEG epochs from each subject. The subject index corresponding to each point is indicated on the plot.
iteration to another. In the operational phase discussed above, we pre-treatment EEG data and then reduced into a set of Nr = 8 most- need a single set of Nr features that best represents the entire train- relevant features using the available training set data, as discussed ing set. We could identify a single set of reduced features simply by in Section . The prediction performance was then evaluated applying the feature selection process once on the entire training using the leave-one-out cross-validation procedure discussed pre- set. The difficulty with this approach however, is that it is possible viously. The performance evaluation results using the combined that the data from a subset of subjects can dominate the feature EO and EC EEG data sets together for the 23 subjects, for a response selection process. A convenient method of avoiding this possibility threshold value d1 ¼ 88:5 and Nr ¼ 8 are summarized in (i), is, at each L1O iteration, to select a list of k Nr features, where k is a where it is seen that the overall prediction performance is 87.12%.
constant greater than unity, typically greater than 3 in our exper- When d1 is reduced to 83.5 corresponding to a 30% responder rate, iments. Then the desired single set of Nr features is chosen as those the overall performance becomes 89.7%. Two major latent vari- which occur most frequently amongst the lists generated over all ables are used for the kernel PLSR method. These results indicate L1O iterations. In this way, the features are selected on an equita- that it is indeed possible to predict the response to clozapine ther- ble basis from different combinations of the data. To find a proper apy using the proposed methods. Further experiments were per- value for k, this procedure is repeated with increasing values of k, formed using a range of d1 from 83.5 to 92.5; prediction until at least Nr common features (out of the available k Nr fea- performance was above 85% in all cases.
tures) can be found among all iterations of the L1O test.
We now present results using data from both subject groups A For optimal performance of the proposed scheme, the classifier and B. For this second experiment, we train the classifiers using must operate in an Nr - dimensional feature space, where in ourexperiments the value of Nr is 8. However, if we wish to visualizethe feature space on a plane, it is necessary to compress the feature space. It is readily verified that an optimal linear basis for dimen- sionality compression is the set of principal components of the fea- ture space, obtained by principal component analysis (PCA). Better visualization performance can sometimes be obtained through a nonlinear principal component method, in which case kernelization techniques () are applied to PCA. We refer to the nonlinearized version of PCA as kernel PCA (KPCA). In our study,the KPCA method is used only for the purposes of displaying the clustering results, as in and is not used in the predic- axis 2, (PC2) -0.2 tion process.
3.1. Treatment-efficacy prediction performance Fig. 3. Same as , except that all data points belonging to each subject in Fig. 2 The first set of results uses data from Group A which consists of are averaged to provide one point per subject. The clustering behaviour between 23 subjects. The set of candidate features were extracted from the the R and NR groups is clearly evident.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Table 3Performance results predicting the response to clozapine therapy in Group A subjects using Nr ¼ 8. Subjects with a post-treatment PANSS score of less than or more than d1 are considered responders (R) and non-responders (NR), respectively.
(i). d1 ¼ 88:5 (corresponds to 52% response rate) 83.33% = Sensitivity 90.91% = SpecificityAverage = 87.12% (ii). d1 ¼ 83:5 (corresponds to 30% response rate) 85.7% = Sensitivity 93.75% = SpecificityAverage = 89.7% only Group A as training data, and then test the prediction perfor- mance over Group B. A group B responder in this case is defined asa subject having an improvement of at least 25% between the pre- Our findings support the potential utility of machine learning and post-QCA scores. The average treatment-efficacy prediction methods in clinical psychiatry. In the current example we have performance for this experiment was 85.7% as reflected in been able to predict, in advance of the first dose, whether a treat- This shows a satisfactory prediction performance under different ment–resistant patient will or will not respond to a powerful but conditions when the classifier is trained on one set, and then tested potentially toxic medication. In various experiments, we evaluated on another independent set.
the performance of advanced prediction models in conjunction We now show an example illustrating the clustering behaviour with kernelization methods to analyze pre-treatment EEG to pre- for the proposed scheme, using Group A data. shows a scatter dict the responsiveness to clozapine. These results support the idea plot containing 270 points corresponding to the Mt ¼ 270 avail- that resting EEG data contains embedded salient information re- able epochs of EEG data from the Group A subjects. This figure lated to clozapine treatment-outcome that can be extracted using was generated using the kernel PCA method with a Gaussian ker- machine learning techniques.
nel. Filled circles correspond to responders and squares to non- We can provide some further evidence of the validity of the pro- responders. In this figure, there are nominally 12 points associated posed prediction method, as follows. First, the clustering behaviour with each subject; however, there are 2 subjects that have only 10 shown in shows clean separation of the clusters, which is a or 8 points. The number written beside each point is the corre- strong indication that the reduced features can indeed discrimi- sponding subject index, which is assigned arbitrarily. Averaging nate long-term response. Also, with the L1O cross-validation pro- the location of all points corresponding to each subject results in cedure, different test and training samples are used in each , in which each subject is shown with one point. The cluster- iteration, and yet overall, a reasonable performance level is at- ing between the R and NR groups is clearly evident in this figure.
tained. This suggests the proposed machine learning procedure is The clustering performance shown in this figure is indicative that consistent across variations of the input data. A final argument to the proposed machine learning procedure will perform well, as suggest validity of the proposed method is with regard to the re- the results of suggest.
sults of . Here, the prediction procedure is trained on GroupA data and tested on a completely independent set of Group B data.
Even though performance degrades somewhat, the resulting per-formance of 85.7% is still quite satisfactory.
3.2. A list of discriminating features We can further examine the integrity of the proposed prediction procedure by evaluating the probability that our demonstrated We show a list of 20 most relevant EEG features of interest in prediction performance would have been due to chance alone.
These are the features that are most strongly discrimina-tive of response to clozapine. Each of the features listed in the tableis selected at least once over all L1O iterations. is a depiction of the most-relevant features selected in . A connection be- A list of discriminating features for treatment-efficacy prediction using pre-treatment tween two electrode sites in the figure corresponds to a selected EEG information. Note that the discriminative feature subset is not unique and thereis statistical dependence among them. d feature which involves those two locations. It roughly indicates any relations between EEG sensors that convey relevant informa- Selected EEG-driven Feature tion for our prediction problem. This figure depicts how the se- Mutual Information between T3 & P3 lected features could give clues about the locality and Mutual Information between T3 & O1 interconnection of neurological mechanisms associated with a po- Mutual Information between C3 & P3 Correlation between F8 & T4 sitive response to clozapine. Further investigation of this matter re- Coherence at f = 6 Hz between T3 & O1 mains a promising topic for future work.
Coherence at f = 6 Hz between T3 & P3 Coherence at f = 6 Hz between C3 & O1 Coherence at f = 7 Hz between F3 & P3 Coherence at f = 8 Hz between T6 & P3 Coherence at f = 9 Hz between T3 & O1 Independent test performance using subjects in group A as training data (with Coherence at f = 10 Hz between T3 & T5 d1 ¼ 88:5 and Nr ¼ 8), and group B as test subjects. Response to clozapine therapy is Coherence at f = 10 Hz between T3 & P3 defined as more than a 25% improvement in the QCA score. Subjects with a post- Left to right PSD-ratio at f = 10 Hz, T5/T6 treatment QCA score of less than or more than d1 are considered responders (R) and Left to right PSD-ratio at f = 11 Hz, T5/T6 non-responders (NR), respectively.
Coherence at f = 11 Hz between C3 & P3 Coherence at f = 11 Hz between T3 & P3 Left to right PSD-ratio at f = 12 Hz, T5/T6 85.7% = Sensitivity Coherence at f = 12 Hz between T3 & T5 85.7% = Specificity Coherence at f = 13 Hz between F7 & F3 Left to right PSD-ratio at f = 16 Hz, T5/T6

A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 not being constrained by the theoretical constructs derived fromprevious studies. Without devaluing previous work, or discountingthe importance of replication, limiting feature selection to only agroup made up of those reported to be useful in previous studiesdecreases the probability that new and highly salient features willbe discovered. Also we have not employed traditional EEG fre-quency bands and instead used frequency components individuallywithin a 1 Hz resolution window. This maximizes the possibility ofdetecting potentially important EEG features that might otherwisebe obscured when power is integrated over a broad range of fre-quencies in a given band, e.g. a 10 Hz signal might be lost in the8–12 Hz alpha band.
The goal of this paper is to propose a new clinical data analysis method and derive an empirical set of EEG features predictive ofresponse to clozapine, not to derive neurological information Fig. 4. A rough schematic drawing which shows a list of some relevant features by regarding the pathophysiology of schizophrenia. Nevertheless the connections, as reflected in Connections are shown by solid thick lines.
clustering of relevant EEG features in the temporo-parietal area Electrodes A1 and A2 represent the linked ears reference.
of the dominant hemisphere, as seen in and in ,may be of some interest to those studying regional brain activitypatterns in patients with schizophrenia. Others have described With reference to (i), there are 12 responders and 11 non- bilateral reduced grey matter volume in the temporal lobes (e.g., responders, so the probability p of a responder may be taken as and electrophysiological abnormalities in 12/23 = 0.5212. Assuming all subjects are independent, the proba- the left temporo-parietal region on EEG (e.g., ) in bility of a prediction error is governed by a binomial distribution, which is parameterized by N, the number of samples, and p, in this This retrospective study suffers from some weaknesses. Most case the probability of a responder. Therefore, the probability of notably our QCA clinical rating is based on chart review and there- this level of performance (10 classifications as R and 2 as NR out fore likely to be less accurate than a standardized PANSS. However, of N = 12 true responders) occurring due to chance alone is evalu- our raters were clinicians expert in the treatment of schizophrenia ated from the binomial distribution as 0.0226. Similarly, the value and familiar with the subjects being evaluated. The QCA would of p for the non-responder case is 0.4783, so the probability of esti- therefore have reasonable clinical validity. The high predictive mating 10 NR and 1 R out of 11 non-responders due to chance accuracy of our algorithm in both Group A and B subjects even in alone is 0.0036. Similarly, for the case of (ii), the corre- the face of this source of outcome variance may speak to the sponding figures are 0.0039 and 0.0211 for the R and NR groups, robustness of this methodology. As QCA and PANSS ratings were respectively. Thus we see that these figures are negligibly small completed years before this project they could not have been influ- and we can conclude the prediction results are almost certainly a enced by the machine learning assignment into responder and consequence of the distinguishing characteristics of the EEG mea- surements obtained from the two groups.
It must be noted the results of this pilot study are derived using By employing more advanced analytical models, the present a relatively small quantity of data. Our findings must be replicated study was designed to extend and improve upon the utility of in a much larger sample of training and test subjects before they the EEG in predicting the responsiveness to clozapine as investi- can be accepted with confidence. Notwithstanding these issues, gated in other studies. Although found that our data suggest that machine learning methods of analyzing changes in EEG features correlated with outcome, post-treatment EEG signal may be employed to create a useful psychiatric manage- EEG data was required. Our methodology is more potentially useful ment tool. Furthermore, the methodology described in this paper to the clinician as prediction is possible using EEG data collected could be extended to construct models that predict the response before this potentially toxic treatment is initiated. Further, even to various other treatments available for patients with schizophre- though were successful at identifying features nia or with other psychiatric conditions. Finally, it may be possible which were indicative of response, they did not incorporate their to incorporate a range of other clinical and laboratory data beyond findings into a quantitative prediction algorithm. We have there- EEG measurements, such as personality inventory scores, personal fore been able to extend their work by accomplishing this purpose.
and demographic information and treatment history to improve Our proposed feature selection method is novel in the respect clustering behaviour and prediction performance.
that a small number of maximally discriminative features are auto- An additional topic for future consideration is to investigate the matically identified from a very large list of candidate features. This minimum number of channels needed to yield adequate prediction is in contrast to the previous approaches, which inherently require performance. It may be that a reduced configuration of electrodes a trial-and-error procedure. The previous approach consists of concentrated over the left side (as suggested by will still hypothesizing that a single feature may be discriminative, and then yield an acceptable level of performance, but at a reduced cost.
verifying or rejecting the hypothesis by experiment. Thus ourmethod can identify salient features that could easily be missedusing previous methods.
5. Appendix A. The QCA clinical rating procedure It is gratifying to note that our proposed feature selection pro- cedure did select some features that were identified from previous The QCA clinical rating procedure was devised in the context of studies. This serves as a verification of our method and provides a an un-related earlier naturalistic retrospective un-published clini- useful connection with the previous research. Nevertheless, the cal study of treatment-resistant schizophrenic patients being con- mathematical structure produced by our ML methods was created sidered for clozapine treatment. The subjects in the present study from the training data alone without an a priori model or previous were included in this previous study. An experienced clinician re- research findings (e.g. regarding QEEG differences between viewed all the available clinical descriptive information of the pa- responders and non-responders). As such it has the advantage of tient's symptomatology prior to beginning a course of clozapine.
A. Khodayari-Rostamabad et al. / Clinical Neurophysiology 121 (2010) 1998–2006 Reported symptoms, corresponding to those described in the Guo Y, Bowman FD, Kilts C. Predicting the brain response to treatment using a PANSS, were rated as: present, moderate or severe on a one to Bayesian hierarchical model with application to a study of schizophrenia. Hum.
Brain Mapp. 2008;29:1092–109.
six point scale. Only explicitly described symptoms were scored Hughes JR, John ER. Conventional and Quantitative Electroencephalography in and the clinical rater was instructed not to infer the presence of po- Psychiatry. J. Neuropsychiatry Clin. Neurosc. 1999;11:190–208.
tential symptoms. The same rating was repeated, based on case re- Ince N F, Goksu F, Pellizzer G, Tewfik A, Stephane M. Selection of spectro-temporal patterns in multichannel MEG with support vector machines for schizophrenia cords describing current symptoms at the time (usually after classification. Proc. Annual Int. Conf. IEEE Eng. in Medicine and Biology Society approximately six months) when the decision was made to either 2008; 3554–3557.
discontinue or continue with on-going maintenance clozapine Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr. Bull. 1987;13:261–76.
Kane J, Honigfeld G, Singer J. Meltzer H and the Clozaril Collaborative Study Group.
Clozapine for the treatment-resistant schizophrenic: A double-blindcomparison Kim D, Burge J, Lane T, Pearlson GD, Kiehl KA, Calhoun VD. Hybrid ICA-Bayesian network approach reveals distinct effective connectivity differences in The authors would like to thank Margarita Criollo, Joy Fournier, schizophrenia. Neuroimage 2008;42:1560–8.
and Eleanor Bard for their help in clinical experiments. This work Knott V, Labelle A, Jones B, Mahoney C. EEG hemispheric asymmetry as a predictor was supported by the Natural Sciences and Engineering Research and correlate of short-term response to clozapine treatment in schizophrenia.
Clin. Electroencephalogr. 2000;31:145–52.
Council of Canada (NSERC).
Knott V, Labelle A, Jones B, Mahoney C. Quantitative EEG in schizophrenia and in response to acute and chronic clozapine treatment. Schizophr. Res.
Knott VJ, LaBelle A, Jones B, Mahoney C. EEG coherence following acute and chronic Adler G, Grieshaber S, Faude V, Thebaldi B, Dressing H. Clozapine in patients with Kwak N, Choi C-H. Input feature selection by mutual information based on Parzen chronic schizophrenia: serum level, EEG and memory performance.
Birca A, Carmant L, Lortie A, Lassonde M. Interaction between the flash evoked Leucht S, Kane JM, Kissling W, Hamann J, Etschel E, Engel RR. What does the PANSS SSVEPs and the spontaneous EEG activity in children and adults. Clin.
mean? Schizophr. Res. 2005;79:231–8.
Lin C, Wang Y, Chen J, Liou Y, Bai Y, Lai I, Chen T, Chiu H, Li Y. Artificial neural Boutros NN, Arfken C, Galderisi S, Warrick J, Pratt G, Iacono W. The status of spectral network prediction of clozapine response with combined pharmacogenetic and EEG abnormality as a diagnostic test for schizophrenia. Schizophr. Res.
clinical data. Comput. Methods Programs Biomed. 2008;91:91–9.
Malow BA, Reese KB, Sato S, Bogard PJ, Malhotra AK, Tung-Ping S, Pickar D.
Coburn KL, Lauterbach EC, Boutros NN, Black KJ, Arciniegas DB, Coffey CE. The Value Spectrum of EEG abnormalities during clozapine treatment. Electroencephalogr.
of Quantitative Electroencephalography in Clinical Psychiatry: A Report by the Clin. Neurophysiol. 1994;91:205–11.
Committee on Research of the American Neuropsychiatric Association. J.
Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to kernel-based Neuropsychiatry Clin. Neurosci. 2006;18:460–500.
learning algorithms. IEEE Trans Neural Networks 2001;12:181–201.
Cover TM, Thomas, JA. Elements of Information Theory, 2nd Ed. John Wiley & Sons, Oikonomou T, Sakkalis V, Tollis IG, Micheloyannis S. Searching and visualizing brain networks in Schizophrenia. Springer Lecture Notes in Computer Science.
Dunki RM, Dressel M. Statistics of biophysical signal characteristics and state Biological and Medical Data Analysis 2006;4345:172–82.
specificity of the human EEG. Physica A 2006;370:632–50.
Okugawa G, Sedvall GC, Agartz I. Reduced grey and white matter volumes in the Essali A, Haj-Hasan NA, Li C, Rathbone J. Clozapine versus typical neuroleptic temporal lobe of male patients with chronic schizophrenia. Eur. Arch.
medication for schizophrenia. Cochrane Database of Systematic Reviews 2009; Psychiatry Clin. Neurosci. 2002;252:120–3.
John Wiley and Sons Ltd, Art No.CD000059.
Peng H, Long F, Ding C. Feature selection based on mutual information: Criteria of Fan Y, Shen D, Gur RC, Gur RE, Davatzikos C. COMPARE: classification of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern morphological patterns using adaptive regional elements. IEEE Trans. Medi.
Analysis and Machine Intelligence 2005;27:1226–38.
Rosipal R, Kramer N. Overview and recent advances in partial least squares. In: Faux SF, Shenton ME, McCarley RW, Torello MW, Duffy FH. P200 topographic Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, latent alterations in schizophrenia: evidence for left temporal-centroparietal region structure and feature selection techniques. Lecture Notes in Computer deficits. Electroencephalogr. Clin. Neurophysiol. Suppl. 1987;40:681–7.
Science: Springer; 2006. p. 34–51.
Freudenreich O, Weiner RD, McEvoy JP. Clozapine-induced electroencephalogram Sakkalis V, Oikonomou T, Pachou E, Tollis I, Micheloyannis S, Zervakis M. Time- changes as a function of clozapine serum levels. Biol. Psychiatry significant wavelet coherence for the evaluation of Schizophrenic brain activity using a graph theory approach. Proceedings Int Conference of the IEEE Gallinat J, Heinz A. Combination of multimodal imaging and molecular genetic Engineering in Medicine and Biology 2006:4265–8.
information to investigate complex psychiatric disorders. Pharmacopsychiatry Struyf J, Dobrin S, Page D. Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.
Gross A, Joutsiniemi SL, Rimon R, Appelberg B. Clozapine-induced QEEG changes BMC Genomics 2008; 9:(531).
correlate with clinical response in schizophrenic patients: a prospective, Varma, S, R. Simon, Bias in error estimation when using cross-validation for model longitudinal study. Pharmacopsychiatry 2004;37:119–22.
selection, BMC Bioinformatics, 2006; 7:(91).
Gunther W, Baghai T, Naber D, Spatz R, Hippius H. EEG alterations and seizures Young CR. Bowers Jr. MB, Mazure CM. Management of the adverse effects of during treatment with clozapine: a retrospective study of 283 patients.
clozapine. Schizophrenia Bulletin 1998;24:381–90.

Source: http://recherchesantementale.qc.ca/wp-content/uploads/2012/11/eeg2010.pdf


Beiträge des Instituts für Umweltsystemforschung der Universität Osnabrück Herausgeber: Prof. Dr. Michael Matthies Beitrag Nr. 50 Chemical Fate of Sulfadiazine in Soil: Mechanisms and Modelling Approaches Christiane Zarfl November 2008 ISSN Nr. 1433-3805 Prof. Dr. Michael Matthies Universität Osnabrück Institut für Umweltsystemforschung Barbarastr. 12


IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 51, NO. 2, FEBRUARY 2004 Nonlinear Analysis of the Separate Contributions of Autonomic Nervous Systems to Heart Rate Variability Using Principal Dynamic Modes Yuru Zhong, Hengliang Wang, Ki Hwan Ju, Kung-Ming Jan, and Ki H. Chon*, Member, IEEE Abstract—This paper introduces a modified principal dynamic