## Annualreport2014.dvi

April 2013 – March 2014
IAP-Network StUDyS
Developing crucial Statistical methods for Understanding
major complex Dynamic Systems in natural, biomedical
and social sciences
ordinator: Ir ene Gijbels, KUL-1
Phase VII, Contract P7/06
General information and list of abbreviations
The network consists of 7 Belgian partners and 4 international partners. We list in Table1 the names of the partners of the network, with their institution and research unit.

Katholieke Universiteit Leuven, Statistics
Katholieke Universiteit Leuven, Quantitative Psychology and Individual Diﬀerences
Universit´e catholique de Louvain,Institut de statistique, biostatistique et sciences actuarielles
Universiteit Gent, Biometrics and Statistics
Universiteit Hasselt, Center for Statistics
Universit´e libre de Bruxelles, ECARES and Department of Mathematics
Universit´e de Li ege, Statistique
Charles University at Prague, Statistics
Rijksuniversiteit Groningen, Social Science Statistics
Universidad de Santiago de Compostela, Statistics
London School of Hygiene and Tropical Medicine, Medical Statistics
Table 1: Belgian and international partners of the network.

The research project is built up around ﬁve work packages (WPs), and one meta work
package (MWP). Table 2 gives the *main *contributors to each work package and indicatesper package the partner that is coordinating the work. Obviously each partner is invitedto contribute to each work package, and actual contributions (from small to major) toeach work package come from all partners together.

Main contributing partners
WP1: The study of associations and dependencies
KUL-1, UCL, ULB, ULG∗, CU, USC
in complex systems
WP2: The study of diﬀerent dynamics in complex systems
KUL-1, KUL-2, ULB∗, CU, RUG
WP3: Multivariate modeling
KUL-1, UCL, UG, UH∗, ULG, USC
and hierarchically structured data
WP4: Dynamics of a stochastic system and the impact
KUL-1, KUL-2, UCL∗, UG, RUG
of non-observed characteristics
WP5: Variable and model selection and the study
KUL-1, KUL-2, UG∗, UH, ULB, ULG
of (ultra-)high dimensional data
Meta WP: The developed statistical methods in full use
all partners (KUL-1∗)
Table 2: Main contributors per work package, and coordinating partner per work package(indicated with a ∗).

In Section 3 we report on the research results obtained, in the various work packages,during the report period. In this section we brieﬂy summarize per work package the mainachievements. We also brieﬂy comment on the most important network activities duringthe past year.

Most important network activities
The past year has been a special year, since it covers most of the year 2013, which was theInternational Year of Statistics. This special occasion acted as a catalyst on the researchactivities and events in the statistical community. Many meetings were organized bynetwork partners, within Belgium but also abroad. Several short courses and a wealthof statistical seminars were oﬀered in the network. Moreover, two honorary doctorateswere delivered with promoters from the network. In addition, the interuniversity platform
for training of young researchers, ﬁnanced via the Flemish government, got into an evenhigher acceleration with the development of new interuniversity training courses, and withsetting up an initiative for a future PhD researchers event jointly with colleagues from theFrench-speaking universities.

Description of the research completed
In the subsections below we describe the progress that has been made in the variouswork packages. The reporting in each work package is divided mostly according to themain objectives formulated in the original proposal. For convenience we therefore list (initalics) the major objectives exactly as formulated in the original proposal. Since researchis continuously evolving some slight shifts in emphasis may result. For each work package,we also indicate interactions with research results in other work packages. The referencesmentioned in the text can be found at the end of this report (for published or acceptedpapers, books or chapters in books) and on the web site for submitted papers (see theitem ‘Technical papers').

Work package 1: The study of associations and dependencies incomplex systems
Main goal: Measuring associations between characteristics (of scalar type, functionaltype, .) of a stochastic system can enter at various levels (in a time evolution, in tailsof distributions, .) This work package studies the statistical modeling of associationsand dependencies in complex stochastic systems, including testing for speciﬁc associationstructures.

Major objectives:
1.1. *Modeling of complex dependencies through copulas.*
*(1) developing methods for modeling dependencies between real-valued variables and*
*(2) developing and study tests for testing for specific dependency (copula) struc-*
*(3) developing methods for constrained copula estimation, and doing inference based*
*(4) studying the use of conditional copula estimators in vine copula models to deal*
*with high-dimensional data;*
*(5) developing methods for conditional copula estimation for censored observations;*
*(6) developing nonparametric inference methods for extremal dependence within*
*random vectors, with emphasis on the high-dimensional case;*
*(7) developing dimension-reduction techniques for analyzing extremal dependence.*
1.2. *Flexible regression models.*
*The general aim is to study the associations between a vector of responses and a*
*vector of explanatory variables in regression by means of flexible regression models.*

More precisely, we plan to :
*(8) do inference in such flexible regression models, by using frequentist and Bayesian*
*(9) adapt regression methods that are developed for completely observed data, to the*
*censored data case;*
*(10) develop statistical methods for estimating quantiles in advanced flexible regres-*
*sion models.*
1.3. *Qualitative constraints and goodness-of-fit testing.*
*Here we study the associations or dependencies in a collection of variables. Thespecific objectives:*
*(11) developing tests for testing that two groups are mutually independent;*
*(12) testing that a pair of variables are independent conditionally upon the other*
We next describe the obtained research results.

Modeling of complex dependencies through copulas
Conditional copulas
Researchers from KUL-1, UH and CU joined forces to develop bootstrap procedures forstatistical inference for conditional copula's, see Omelka *et al. *(2013). Veraverbeke *etal. *(2014) improve upon available nonparametric estimators for conditional distributionfunctions. The proposed procedure can also be used in other contexts, for example, whencensoring occurs. The link between conditional copula and conditional distribution func-tion then also allows to us this technique to estimate a conditional copula under theso-called simplifying assumption (such as in vine pair-copula constructions), see Gijbels*et al. *(2013).

Flexible estimation of conditional copula was also performed at ULG for conditionallyexchangeable random variables (Lambert, 2014). The starting point is a spline approxi-mation to the generator of an Archimedean copula. Changes in the dependence structurewith a covariate x are modelled by ﬂexible regression of the spline coeﬃcients on x.

Copulas: estimation and tests for specific dependence structures
The research work on testing for speciﬁc dependence structures and estimation under qual-itative constraints has been continued. It involves testing for tail monotonicity, for positivequadrant dependency and for stochastic monotonicity, among others, see e.g. Gijbels andSznajder (2013a,b).

Rayner *et al. *(2013) introduced an eﬃcient algorithm for the construction of polynomialsthat are orthonormal on bivariate density functions: it can be used to build goodness-of-ﬁttests for copulas.

For multivariate Gaussian copula models with unknown margins and structured correla-tion matrices, a rank-based, semiparametrically eﬃcient estimator is proposed in Segers*et al. *(2013) for the Euclidean copula parameter.

Copulas and extremal dependences
In Einmahl *et al. *(2014), tail dependence models for distributions attracted to a max-stable law are ﬁtted using observations above a high threshold. To cope with spatial,high-dimensional data, a rank-based M-estimator is proposed relying on bivariate marginsonly. A data-driven weight matrix is used to minimize the asymptotic variance. Ananalysis of wind speed data from the Netherlands illustrates the method.

Copulas and applications
A review on diﬀerent approaches for statistical inference when modelling dependenciesthrough copulas was provided in Gijbels *et al. *(2014). In the PhD research work of KlausHerrmann (KUL-1) one of the aims is to investigate the distribution (and consequentlyquantiles and expectations) of a sum of a ﬁnite number of dependent components. Thedependence between the components is modelled through a copula, and applications inﬁnance are looked at.

Copula-based measures of association between two random vectors are suggested in Grothe*et al. *(2013) The measures are applied to characterize strength and direction of associationof northern and southern European bond markets during the recent Euro crisis as well asassociation of stock markets with bond markets.

Statistical modelling of directional variables has also been addressed with contributions todensity estimation for directional–linear data. Garc´ıa–Portugu´es *et al. *(2013a) studies therelation between wind direction and SO2 concentration through circular–linear densities,with a copula approach. In addition, a kernel density estimator (Garc´ıa–Portugu´es *et al.*,2013b) and bandwidth selectors (Garc´ıa–Portugu´es, 2013) were introduced.

Copulas were also used by Braeken *et al. *(2013) in structural equation models to specifydependencies between error variables in personality questionnaires.

Flexible regression models
Inference in flexible regression models
Techniques of approximations with P-splines have been studied intensively in the networkin previous years. It is one of the basic ingredient in the study of grouped regularizationmethods for additive varying coeﬃcient models in Antoniadis *et al. *(2014). Conditionalquantile estimation in varying coeﬃcient models was tackled in Andriyana *et al. *(2014)and testing procedures for qualitative features of the varying coeﬃcients were developedin Ahkim and Verhasselt (2014). The doctoral research of Yudhie Andriyana furtherfocuses on dealing with heteroscedasticity when estimating conditional quantiles. P-splines
were also used in the modeling of survival data gathered on subjects sampled from apopulation where an unknown proportion of them is not susceptible to the monitoredevent: Bremhorst and Lambert (2014), in a collaboration between UCL and ULG, extendthe promotion time model with covariates inﬂuencing simultaneously the probability ofbeing cured and the latent survival distribution. Identiﬁcation issues are carefully studiedand discussed. P-spline models involve the selection of penalty parameters. Frasso andEilers (2014) developed an innovative procedure based on L-curves and V-curves. It iscomputationally eﬃcient and robust, in particular when the data are serially correlated(with standard methods tending to over-smooth the data).

Wavelets and kernel methods were also used in various contexts. Autin *et al. *(2013, 2014)studied ﬂexible estimation by wavelets in multidimensional settings. Pardo-Fern´andez *etal. *(2014) test for the equality of K regression curves in a fully nonparametric context: itis based on the comparison of empirical estimators of the characteristic functions of theregression residuals in the K sub-populations. Nonparametric tools were also developedto explore directional (environmental) data (Oliveira *et al.*, 2013, 2014).

Adapting regression methods that are developed for completely observed data,to the censored data case
In Lopez *et al. *(2013), a nonparametric single-index regression model is considered in whichthe response is subject to right censoring with completely observed covariates. With po-tentially interval censored responses, Lambert (2013) proposed a nonparametric additivemodel for the location and the dispersion of a continuous variable with an arbitrary smoothconditional distribution.

Estimation of quantiles in advanced flexible regression models
Many univariate robust estimators are based on quantiles. The variance and hence themean squared error of some quantile-based estimators can be reduced by using smoothingtechniques, as was shown in Hubert *et al. *(2013).

Noh *et al. *(2013c) use the characterization of the joint distribution of variables by
their marginal distributions and the underlying copula to estimate conditional quantilessemi-parametrically. Mammen *et al. *(2013) compare nonparametric and parametric ﬁtsin regression quantile models to test parametric speciﬁcations.

Just as there is a strong connection between quantiles and depth in location models
(Hallin *et al.*, 2014), regression depth and regression quantiles are intimately related. Inthe regression context, Paindaveine and Van Bever (2013) introduce a local regressiondepth concept that can cope with multimodal distributions.

Qualitative constraints and goodness-of-fit testing
A novel approach to testing for homogeneity of dispersions, not relying on parametricassumptions and based on the means of within-group distances, was developed in a jointresearch by the KUL-1 and CU partners (Gijbels and Omelka, 2013).

Estimation of generalized additive models with functional covariates is tackled by Febrero–Bande and Gonz´
alez–Manteiga (2013). Garc´ıa–Portugu´es *et al. *(2013) introduce a testing
procedure for the functional linear model with scalar response. Gonz´alez–Manteiga andCrujeiras (2013) provide an overview on goodness–of–ﬁt tests for regression models. Newproposals for testing regression methods are also provided by Boente *et al. *(2013), for ageneralized partially linear model, and by Ojeda *et al. *(2013), in the presence of selection–biased data. A test for assessing independence between a directional and a linear variablehas been proposed by Garc´ıa–Portugu´es *et al. *(2014). Goodness–of–Fit tests for directionaldensities (Boente *et al.*, 2014) and a test for directional–linear independence (Garc´ıa–Portugu´es *et al.*, 2013a) were also developed.

Interactions with other work packages
In all other work packages the study of relationships between variables (of diﬀerent) natureis among the core tasks. The techniques developed in this work package therefore servedirectly as a key input for the other work packages, where often the focus is on diﬀerentaspects, such as complexity and high-dimensionality of the data (link to WP5), completeversus partial observational schemes (links to WP3 and WP4), the time dynamical aspectof the dependence structure (link to WP2), to name some crucial elements.

Work package 2: The study of different dynamics in complex sys-tems
Main goal: A stochastic system can exhibit diﬀerent dynamics (time dynamics, spatialdynamics, .). This work package is concerned with eﬃciently modeling these diﬀerentlayers of dynamics, and with the development of statistical methodology for these complexdynamic systems.

Major objectives:
(1) *developing methods for the joint modeling of many exchangeable stochastic systems*
*that show sizeable variation in their dynamics;*
(2) *developing methods for the joint modeling of many correlated stochastic systems;*
(3) *developing efficient statistical methods for a broad range of complex dynamic systems,*
*involving e.g. serially correlated functional data;*
(4) *developing sparse and robust estimators of models for high dimensional time series,*
*in particular for large dimensional vector autoregressive models.*
We next describe the obtained research results.

State space modeling, Bayesian shrinkage, and sparse and robust rep-resentations
As last year, most emphasis in this part of the project was put on the analysis andprediction of high-dimensional time series, a ﬁeld in which factor model methods so far havebeen the most eﬃcient approach. A general form of dynamic factor models, containingall others as particular cases, was introduced in a paper in 2000 by Forni and co-authors,but their deﬁnitions were based on a spectral approach. Hallin and Lippi (2013) ratheradopted a time-domain approach, showing that the general dynamic factor model, contraryto other factor models, follows from a general representation result, and therefore doesnot really place restrictions on the high-dimensional process under study. A weak featureof general dynamic factor model methods, however, is that they classically involve two-sided ﬁlters resulting from the factorization of spectral density matrices: they are not wellsuited, therefore, in forecasting problems. Forni *et al. *(2014) improved on this by showinghow that two-sidedness issue can be handled by exploiting a speciﬁc property of reducedrank stochastic processes.

Still for the forecasting of high-dimensional times series, vector autoregressions (VARs)
are ﬂexible and useful time series models. In high dimensions, however, their dense pa-rameterization leads to unstable inference and inaccurate forecasts. Informative priors canthen be used to shrink the richly parameterized VAR towards a parsimonious benchmark.

Giannone, Lenza and Primiceri (2014) studied the optimal choice of informativeness ofthese priors, which were treated as additional parameters. This reduces the importance ofsubjective priori choices and provides good out-of-sample forecasting performances. Fiecasand von Sachs (2013) adopted a similar approach and proposed shrinking a nonparamet-ric spectral estimator (smoothed periodogram) towards a diagonal shrinkage target whichconstitutes a compromise between a fully parametric (VAR) ﬁt and a fully non-structuredregularising identity matrix. Still with a high-dimensional forecasting objective in mind,Conﬂitti *et al. *(2013) studied the problem of optimally combining individual forecasts ofgross domestic product (GDP) and inﬂation and proposed algorithms for computing theoptimal weights.

Beyond forecasting, another natural approach to analyze high-dimensional time series
is to introduce sparsity assumptions. Further progress was made there, in particular forcointegrated time series. Nonstationary time series are called cointegrated if a linear com-bination of them is stationary; sparse cointegration then means that only few coeﬃcients inthat linear combination are non-zero. Cointegration theory relies heavily on canonical cor-relation analysis (CCA). Wilms and Croux (2013) developed a sparse approach to CCA,that forms the basis of sparse cointegration analysis. More broadly, a comprehensivereview and investigation of statistical process monitoring methods for high-dimensionaltime-series was presented in De Ketelaere *et al. *(2013) and Rato *et al. *(2014).

Finally, De Roover *et al. *(2014) contributed to the joint modeling of many exchange-
able stochastic systems by developing a novel switching principal component analysis
methodology to detect in multivariate time series data phases of consecutive time pointswith similar means and/or covariation structures. On the modeling of correlated stochas-tic systems, Verdonck and Tuerlinckx (2014) developed an analytically tractable multipleattractor network model of information accumulation for speeded two-choice decision mak-ing. This model accounts for a broad range of known psychophysical phenomena.

Functional data analysis
A crucial problem in most functional data analysis (FDA) procedures is the tuning of thedimension parameter. In functional linear models, for instance, one should determine howfast (compared to the sample size) can the dimension of the estimated linear operatorgrow to be consistent for the (inﬁnite-dimensional) population operator. In this context,H¨
ormann and Kidzi´
nski (2014) proposed a data-driven dimension selection that avoids
the usual (unrealistic) assumptions. Moreover, minimax estimation of a linear functionalevaluated at the slope also relies on the optimal choice of a tuning parameter. Johannesand Schenk (2013) proposed a data-driven selection procedure of this tuning parameterand showed that the resulting fully data-driven estimator still achieves minimax optimalrates of convergence.

For functional time series data, most statistical procedures work in the time domain.

ormann *et al. *(2014) rather adopted the frequency domain approach to study opti-
mal dimension reduction for such data. This is achieved by so-called *dynamic functionalprincipal components (DFPCs)*; these coincide with standard FPCs for i.i.d. functionaldata but are considerably more eﬀective as a tool for dimension reduction when serialdependence is present. As a side product, a rigorous mathematical framework for spectraldensity operators along with an asymptotic theory for corresponding lag-window estima-tors were provided. Still in a setting of functional time series data, Aue, Dubart Norinhoand H¨
ormann (2014) introduced a new dimension and model selection criterion in or-
der to simultaneously tune the dimension and order of the model. This resulted into anautomatic forecasting of functional time series.

Other signiﬁcant contributions to FDA were developed. In particular, Slaets *et al.*
(2013) constructed software for ﬂexibly warping functional data in a Bayesian way. Claeskens*et al.*(2014) proposed a new depth function for multivariate functional data, that allowsto order such data, identify a central curve, and measure dispersion of the curves. Ajoint PhD research is currently supervised by the research partners at KUL-1 and CU forstudying probabilistic and statistical properties of depths, also in this FDA setup.

Quantile-based spectral analysis
Quantile- and copula-related spectral concepts recently have been considered. Dette *etal. *(2014) introduced a rank-based cross-periodogram involving Koenker and Bassett'scelebrated check function, and established its pointwise asymptotic properties. Kley *et al.*

(2014) provided an asymptotic analysis of a slightly diﬀerent class of smoothed rank-based
cross-periodograms associated with the *copula spectral density kernels *from Dette *et al.*

(2014). They showed that, for a general class of (possibly non-linear) processes, properlyscaled and centered smoothed versions of those cross-periodograms, indexed by couples ofquantile levels, converge weakly to Gaussian processes. This leads to asymptotic conﬁdenceintervals for *copula spectral density kernels*. It also provides asymptotic distributions(under serial dependence) for a new class of rank-based spectral methods involving theFourier transforms of rank-based serial statistics. Skowronek *et al. *(2014) provided a localstationary extension of Dette *et al. *(2014), which was applied to ﬁnancial time series.

Copulas and ranks were also considered in a (multivariate) serial context in Kojadinovic
*et al. *(2013), for the problem of detecting changes in cross-sectional dependence structures.

They introduced a test based on a recently studied variant of the sequential empiricalcopula process. Ranks there are computed with respect to relevant subsamples, withbeneﬁcial consequences for the sensitivity of the test.

Further contributions to the study of dynamics in complex systems
Extreme values (EVs) are often considered in time series. In this line of research, B¨
and Segers (2013) casted the classical EV method of annual maxima into an asymptoticframework involving a triangular array of block maxima. For absolutely regular stationarysequences, the empirical copula of the sample of vectors of block maxima was shown tobe a consistent and asymptotically normal estimator for the limiting EV copula. EVscan also be analyzed nonparametrically by estimating tail processes; Drees *et al. *(2014)adopted this approach for stationary, regularly varying Markov chains. Similarly, EVs ofa multivariate Markov chain with regulary varying stationary marginal distribution andasymptotically linear behavior were considered in Janssen and Segers (2014).

Modeling the dynamics in a dependence structure between time series is of particular
interest, and a semiparametric approach using local polynomial approximation was pre-sented in Gijbels, Herrmann and Sznajder (2014). The use of the proposed dynamicalmodelling approach is demonstrated in the analysis and forecast of wind speed data. For-est ﬁre data was analyzed in Fuentes-Santos *et al. *(2013), by considering their spatialstructure with spatial point pattern analysis and inference techniques recently developedin the Spatstat package of R.

Finally, Jaeger and Lambert (2013) proposed a Bayesian ODE-penalized B-spline ap-
proach that enables to make inference in models speciﬁed by systems of linear ODEs whenthe error distribution is assumed Gaussian or adequately described by a mixture of Gaus-sians (Jaeger and Lambert, 2014). It is based on an approximation of the solution throughB-spline basis functions and on a penalty term related to the ODE that needs to be solved.

Similar ideas were used by Frasso *et al. *(2013) in frequentist and Bayesian settings to dealwith multi-dimensional dynamics described by PDEs. Eﬃcient tools to make inference insystems ruled by nonlinear ODEs were also developed in Frasso *et al. *(2014).

Interactions with other work packages
All contributions above related to high-dimensional time series data (particularly thoserelying on shrinkage methods or sparsity assumptions) also naturally link to WP5. Sim-ilarly, the numerous papers above dealing with multivariate processes all model the de-pendence between the univariate processes involved, hence are naturally related to WP1where the development of general methods for investigating dependency structures (amongothers via copulas) is among the main objectives.

Work package 3: Multivariate modeling and hierarchically struc-tured data
Main goal: The aim of this work package is two-fold: analyzing hierarchically structureddata, and using hierarchical modeling with the aim to unravel the dynamics of the under-lying stochastic system. Survival data, for example, are often structured hierarchically,and exhibit complex association structures, possibly changing in time or space. Hierarchi-cal non-linear spatial and/or temporal modeling of species in the environment (describinge.g. the biological behavior of animals) is an important modeling tool.

Major objectives:
(1) *developing statistical methods in hierarchical modeling for various endpoints of differ-*
*ent data types (e.g. observations grouped in clusters, hierarchically structured data);*
(2) *developing statistical methods for joint (dynamic) modeling of several endpoints;*
(3) *developing statistical methods for spatial correlated time events, taking into account*
*other spatial information;*
(4) *studying statistical methods for multivariate longitudinal profiles, allowing for incom-*
*plete data;*
(5) *developing goodness-of-fit tests for hierarchical models;*
(6) *studying the modeling of multivariate survival data via copula modeling, frailty mod-*
*eling and transformation modeling techniques, including a comparative study.*
We next describe the obtained research results.

Hierarchical modeling of endpoints of different data types
In many applications, hierarchical modeling is used to study the complex underlying pro-cess of diﬀerent endpoints. While hierarchical methods for normally distributed data arewell understood, there is a lot of ungoing research for other types of outcomes. Buildingon earlier work for non-Gaussian data, Generarlized Linear Mixed Model (GLMM) toolshave been built, extended, and reﬁned for complex hierarchical data, exhibiting correla-tion and/or overdispersion (Efendi and Molenberghs, 2013; Efendi, Molenberghs and Iddi,2014; Ivanova *et al.*, 2014). Van der Elst *et al. *(2013) focused in particular on the ap-plication of such methods in a psychometric context. An additional phenomenon that is
important with count data and requires further study is zero-inﬂation (Iddi and Molen-berghs, 2013; Kassahun *et al.*, 2014). A diﬃculty with these models is the evaluation ofa marginal likelihood integrating out the random eﬀects. As an alternative, these modelshave been looked at from a Bayesian perspective (Ghebretinsae *et al.*, 2013; Aregay *etal.*, 2013, 2014). Molas *et al. *(2013) considered H-likelihood inference where the explicitevaluation of the integral is avoided.

A problem in hierarchical data is the occurrence of negative correlation, because this
jeopardizes the typical hierarchical interpretation. Loeys and Molenberghs (2014) ad-dresses this in the context of psychology. Molenberghs *et al. *(2013) and Kenward andMolenberghs (2014) study various ways of formulating hierarchical models and their rela-tive advantages and disadvantages in terms of interpretable marginal functions and easy-to-use marginal parameters.

Item response theory (IRT) models can also be viewed as a particular set of GLMM
and Nonlinear Mixed Models (NLMM), but with some particular features. San Mart´ın*et al. *(in press) contributed to a better understanding of the (lack of) identiﬁability ofa particular IRT model, whereas Magis (in press) and Magis and De Boeck (in press)contributed to various estimation and testing issues. Ip *et al. *(2013) extensively studiedthe problem of ﬁtting unidimensional item response models to data that have a strongdimension in addition to a few minor nuisance dimensions. From their part, Kadengye *etal. *(in press) developed a novel mixture IRT model that is especially suitable to study thetime dynamics of learning processes (along with individual diﬀerences therein).

Taverne and Lambert (2014) develop the inﬂated discrete beta regression model to
deal with discrete choices on Likert scales in surveys. The mean and the dispersion ofrates are jointly regressed on covariates using an underlying beta distribution with theacknowledgement that the proﬁles of some respondents invariably lead them to makeone speciﬁc choice. A novel approach to diﬀerential item functioning was considered byintroducing lasso penalization to the model ﬁtting process and shrinking the item-groupinteraction parameters (Magis *et al.*, 2014). The comparison of Bayesian and weightedlikelihood estimators of proﬁciency in polytomous item response theory models was furtherdiscussed (Magis, 2014b). Moreover, a formula for the asymptotic standard error of a broadclass of robust estimators of proﬁciency was derived (Magis, 2014a).

Joint modeling of several endpoints
Diﬀerent contributions have been made in the topic of joint modeling of several endpoints.

Joint models for hierarchical data, possibly including time-to-event components, havebeen developed in Njeru Njagi *et al. *(2013, 2014) and Efendi *et al. *(2013). Molas *etal. *(2013) extended H-likelihood inference to multivariate GLMMs. Ceulemans *et al.*

(2013) proposed a robust version of multilevel simultaneous component analysis whichcan withstand the eﬀects of outliers when analysing multivariate data with more than onelevel.

Spatial correlated events
Several applications with spatial correlated events where been investigated. *(1) *The spatio-temporal dynamics of Bluetongue virus were investigated by Faes *et al*. (2013). Ensoy *etal*. (2014) explored the eﬀect of cattle movements on the spread in a dynamic model andinvestigate the spatio-temporal prediction of disease spread. *(2) *Dental caries experiencesare spatially references, since an active lesion on one surface may impact the decay processof the neighboring surfaces. Mutsvari *et al. *(2013) investigated the use of a spatiallyreferenced multilevel autologistic model, correcting for misclassiﬁcation. *(3) *Breast cancerrisk is believed to be associated with several reproductive factors, such as early menarcheand late menopause. Duarte *et al. *(2014) used Structured Additive Regression models toexplore spatial and temporal correlations with a wide range of covariates. *(4) *Yehenew *etal. *(2013) studied the impact of hydro-electic dams on malaria incidence and demonstratethe equivalence between the frailty model and mixed Poisson regression models.

Multivariate longitudinal profiles, allowing for incomplete data
Incomplete data are inherent to longitudinal studies. Geva *et al. *(2013) gave an overviewof how missing data is generally treated in publications. New guidance on preventingand treating missing data are fostered by Mallinckrodt *et al. *(2013a,b). Novel strategiesfor handling missing data are given by Bartlett *et al. *(2014) for the situation of missingcovariate data, by Geerdens *et al. *(2013) for multivariate long-term follow-up data andby Donneau *et al. *(2013) for non-monotone missing ordinal data in longitudinal settings.

Clustering of multivariate longitudinal proﬁles, possibly subject to missing data, were
investigated. Bruckers *et al. *(2013, submitted) propose a cluster algorithm for multivari-ate longitudinal data. They also used techniques from functional data analysis, missingdata analysis and ensemble clustering to reveal groups of similar patients when facinghigh-dimensional multivariate data with missing observations (Bruckers *et al. *2014a,b,submitted).

Goodness-of-fit tests for hierarchical models
Verbeke and Molenberghs (2013) presents model assessment tools for hierarchical datathat are modeled using random eﬀects. A gradient-based method to study goodness-of-ﬁtissues associated with correct and misspeciﬁcation of the random-eﬀects distribution inmixed models was developed (Verbeke and Molenberghs, 2014). Varewyck *et al. *(2014)investigate the impact of power on the use of random eﬀect models as an alternative toﬁxed eﬀect models, when the ﬁxed eﬀect models suﬀer from convergence problems.

Modeling of multivariate survival data
Clustered survival data are often analysed using frailty models. Cetiny¨
bert (2014) propose a semiparametric Bayesian shared frailty model to analyze correlated
interval-censored survival data. Flexible forms are considered jointly for the baselinehazard and the frailty distribution. Loeys *et al. *(2014) consider semi-parametric pro-portional hazards model with crossed random eﬀects in the context of visual recognitiontasks. Munda *et al. *(2013) propose to relax the assumption that the frailty terms actsmultiplicatively on the hazard rate and consider frailty models with time-varying frailties.

Munda and Legrand (2013a,b) investigate the eﬀect of misspeciﬁcation of the frailty modeland propose a new diagnostic plot to guide the choice of the frailty distribution.

Also copula models are commonly used in this context, and several novel contributions
have been made. In Sujica and Van Keilegom (2013) a nonparametric location-scale modelis considered, in which the response is subject to random right censoring and censoringdepends on the response via a known conditional copula function. In Segers and Uytten-daele (2013) a rank-based method is developed to estimate the tree structure of a nestedArchimedean copula. Prenen *et al. *(2013) extended the Archimedean copula methodologyto model multivariate survival data grouped in clusters of variable size. And Rotolo *etal. *(2013) propose a simulation procedure based on a copula model allowing to introducedependence between times of diﬀerent transitions and between those of grouped subjects.

Other ﬂexible models for censored data have been developed as well. Heuchenne and
Laurent (2014a,b) and U˜
na-Alvarez *et al. *(2013) developed a method based on imputa-
tion techniques to estimate nonparametrically the conditional variance in a location-scaleregression model when the data are possibly right-censored and left truncated. Tala-makrouni *et al. *(2014) extended the method of parametrically guided non-parametricregression to the censored data case using an unbiased transformation of the data anda local linear ﬁt. Yang *et al. *(2014) propose estimators of the coeﬃcient functions forthe varying coeﬃcient model in the case where diﬀerent coeﬃcient functions depend ondiﬀerent covariates and the response is subject to random right censoring. Buyze andGoetghebeur (2013) studied properties and power of the cross-over design a non-recurrentright-censored survival time. Braekers and Grouwels (2013) proposed a semi-parametricCoxs regression model for zero-inﬂated left-censored time to event data.

Interactions with other work packages
Several of the mentioned research is linked with research in other work packages. As anexample, the research on multivariate survival data is linked with WP1, for example viathe concepts of frailties and copulas. Several papers also deal with the study of spatio-temporal dynamics of a process, and thus relate to research in WP2. The developedmethods have been investigated in diﬀerent contexts, such as infectious diseases, in psy-chometrics, in bioinformatics, and in clinical trials. As such, there are clear interactionswith the MWP.

Work package 4: Dynamics of a stochastic system and the impactof non-observed characteristics
Main goal: For complex systems it is often impossible to have observations on all impor-tant variables. In micro-econometrics for example, the price of a good in a country mayalso depend on the welfare of the people living in that country, and the latter is diﬃcult tomeasure. This work package studies how to deal with non-observed (latent) characteristicsin complex dynamic systems.

Major objectives:
4.1. *Boundary estimation problems.*
*This problem arises for example in economics when estimating a production or a costfrontier function, which can be viewed as the boundary of the support of multivariaterandom variables. The main objectives:*
*(1) developing non- and semi-parametric methods for estimating boundaries or*
*frontiers, allowing for noisy data (robust estimators and stochastic frontier mod-els);*
*(2) developing non- and semi-parametric methods for estimating frontiers in the*
*presence of heterogeneous factors (observable and/or unobservable);*
*(3) developing non- and semi-parametric methods for estimating frontiers when*
*panel data are available;*
4.2. *Deconvolution and inverse problems.*
*In (ill-posed) inverse problems one often assumes that the transformation that linksthe observed signal to the signal of interest, is known. This assumption is howeverrather unrealistic. Our objectives:*
*(4) quantifying the influence of this noise on optimal convergence rates in a mini-*
*max sense;*
*(5) developing fully data-driven estimation procedures in this context;*
*(6) testing hypothesis regarding the unknown signal;*
*(7) developing Bayesian nonparametric estimation procedures.*
4.3. *Homogeneity, heterogeneity and endogeneity.*
*Identification and nonparametric estimation of the structural function in nonpara-metric instrumental regression with endogenous explanatory variables is a challeng-ing task. The major objectives:*
*(8) constructing fully data-driven estimation procedures;*
*(9) developing hypothesis tests and Bayesian nonparametric estimators, optimal in*
*a minimax sense.*
We next describe the obtained research results.

Boundary estimation problems
In Cuevas *et al. *(2014) and Pateiro-L´
opez and Rodr´ıguez Casal (2013) the estimation of
sets and its application in image analysis are considered. The authors study in particular
shape reconstruction of a point cloud as well as the estimation of a medial axis and innerparallel body. Moreover, in order to handle data clouds in three dimensions, Lafarge *etal. *(2014) have implemented an R package.

The analysis of productivity of ﬁrms is intensively studied in the economics literature.

Simar and Wilson (2013) summarizes recent developments and perspectives for inferenceand estimation problems in nonparametric frontier models. A commonly used approachin this context is the estimation of productivity frontiers assuming typically outputs ob-served with some homogeneous measurement error. Simar, Vanhems and Van Keilegom(2013) extend the concepts of conditional frontiers and conditional eﬃciency scores to thecase of unobserved heterogeneity. The authors propose and analyze a model where theheterogeneity variable is linked to a particular input (or output). It is deﬁned as thepart of the input (or the output), independent from some instrumental variable througha non separable nonparametric model. A particular attention is drawn to endogeneityissues involved in this model. In situations where ﬁrms face heterogeneous environmentalconditions the measurement of the impact of environmental factors given a nonparametricproduction model is studied in Bˇ
adin *et al. *(2013). In case their eﬃciency can be captured
by a conditional eﬃciency Florens *et al. *(2014) use a ﬂexible nonparametric location-scalemodel to eliminate the dependence of inputs/outputs on these factors. These pre-whitenedinputs/outputs deﬁne the optimal frontier function and a pure measure of eﬃciency morereliable to produce rankings, since the inﬂuence of external factors has been eliminated.

Deconvolution and inverse problems
In Autin, Freyermuth and von Sachs (2014), in the context of denoising curves by tree-structured wavelet thresholding, the authors show the superiority of using block-thresholdingwithin the maxiset approach compared to classical unstructured wavelet threshold esti-mators. Considering a functional linear regression model, leading naturally to a linearinverse problem with noisy operator, Johannes and Schenk (2013) derive a minimax the-ory for the nonparametric estimation of a linear functional evaluated at the unknown slopefunction. A plug-in estimator which is based on a dimension reduction techniques and ad-ditional thresholding is proposed. Strong points of the newly derived theory consists inits applicability to a wide range of possible functional regressors, slope parameters andlinear functionals covering in particular point-wise estimation as well as the estimationof weighted averages of the parameter. Considering a Gaussian inverse regression modelthe inﬂuence of observational noise in the operator is studied in Johannes and Schwarz(2013) from a minimax-theory point of view. The authors develop a fully data-drivenestimation procedure combining model selection and Lepskis method which can attainminimax-optimal rates. An extension of the aforementioned results to inverse problemswith a nonlinear operator is a statistical demanding challenge. An iterative estimation ofthe solution of a nonlinear operator equation in the presence of noise in the operator isstudied in Dunker *et al. *(2014).

Considering a Gaussian inverse regression model with known linear operator Johannes
*et al. *(2014) link the frequentist minimax-theory to a nonparametric Bayesian approach.

The proposed nonparametric Bayesian estimation procedure allows to incorporate thefully data-driven choice of the smoothing parameter in a natural way. Moreover, it isshown that the rates of the concentration of the posterior distribution coincide with theminimax-rates established in Johannes and Schwarz (2013).

Homogeneity, heterogeneity and endogeneity
The identiﬁcation in a semi-parametric transformation model, in which the regressionfunction has an additive nonparametric structure and the transformation of the responsebelongs to some parametric family, with an additional presence of endogeneity in theexplanatory variables is studied in Vanhems and Van Keilegom (2013). Moreover, theauthors propose a proﬁle likelihood estimation method for the transformation and studyits theoretical properties.

Tiwari and Heuchenne (2013) estimated an accelerated failure time model for the time
between two transitions (for example, unemployment and employment). This model de-pends linearly on both exogenous and endogenous covariables and takes heterogeneityinto account. Further, Tiwari (2013) also considered a control function approach to es-timate a parametric single index model, where unlike other control function approaches,all regressors are correlated with the unobserved heterogeneity and a subset of them areendogenous.

Considering a nonparametric instrumental regression model Breunig and Johannes
(2013) develop a minimax theory for the nonparametric estimation of linear functionalof the structural function. A nonparametric estimation procedure is present which isminimax-optimal up to a constant over a wider range of classes for the structural function.

Furthermore, a fully data-driven estimation procedure is constructed which can attainminimax optimal rates of convergence. Assuming the independence of the instrument andthe regression error the nonparametric estimation of the structural function leads naturallyto a nonlinear operator equation. Iterative estimation procedures considering linear andnonlinear operator equations with noisy operator in nonparametric instrumental regressionare studied in Johannes *et al. *(2013) and Duncker *et al. *(2014), respectively.

Estimation procedures for panel data with large dimensions n, T , and general forms
of unobservable heterogeneous eﬀects, are discussed in Bada and Liebl (2014). They givea description of their R-package phtt. The package also provides a wide range of dimen-sionality criteria in order to estimate the number of the unobserved factors simultaneouslywith the remaining model parameters.

Another source of endogeneity is the presence of censored or hierarchical data. Molen-
berghs and Lesaﬀre (2013), Mallinckrodt *et al. *(2014), Song *et al. *(2013), Grobler *et al.*

(2014), and Geva *et al. *(2014) contribute towards the dissemination of proper missing-data methodology. Some focus on clinical trials in general, others on speciﬁc areas, such
as geriatrics, etc. Donneau *et al. *(2014) investigates relative performance of missing-datamethodology for incomplete clinical trials.

Loeys *et al. *(2013) consider ﬂexible imputation-based strategies for mediation analysis
based on natural direct and indirect eﬀects. Loeys *et al. *(in press) and Moerkerke *et al. *(inpress) consider the handling of latent variables in mediation analysis and VanderWeele andVansteelandt (2014), and VanderWeele, Vansteelandt and Robins (2014) develop strategiesfor handling multiple mediators. Vansteelandt *et al. *(2014) and Vansteelandt and Daniel(2014) study the impact of model misspeciﬁcation in covariate-adjusted eﬀect estimatorsand develop estimators with improved robustness against model misspeciﬁcation based onpropensity scores.

Understanding the various sources of variability in hierarchical settings often comes
down to testing hypotheses about variance components in mixed models. Standard testingprocedures do not apply due to hypotheses on the boundary of the parameter space.

Drikvandi *et al. *(2013) have proposed methods to do so without parametric assumptionsabout the unobserved latent variables in the models.

Data on tooth emergence in children are usually gathered in an epidemiological study
only at regular intervals. Therefore, the exact emergence time is not known but theinterval of emergence is. In order to understand the dynamics in tooth emergence, Cecere*et al. *(2013) proposed a new exploratory and graphical method which is an extensionof the principal component analysis biplots, but allowing for interval-censoring. Anotherexample where interval-censoring frequently occurs is in the modeling of of antimicrobialresistance data, for which Jaspers *et al. *(2014a,b) used semi-parametric mixture models.

In clinical trials, it is frequently of interest to estimate the time between the onset of
two events (e.g. duration of response in oncology). Dejardin and Lesaﬀre (2013) reviewexisting approaches and discuss their limitations in case that subjects are assessed at ﬁxedvisits but the initial event and the terminating event occur in between visits and, henceare doubly interval censored. Furthermore, they propose a stochastic EM algorithm thatovercomes the problems in the existing approaches. They show by simulations the ﬁnitesample properties of their approach.

Interactions with other work packages
The use of shape constraints and ﬂexible regression models for the estimation of sets,boundaries and frontiers leads to common interests with WP1. The study of regres-sion models with functional covariates is an other link with WP1. Dimension reductionmethods and smoothing parameter selection schemes are shared with WP5. Finally, thepresence of homogeneity, heterogeneity and endogeneity due to censored and hierarchicaldata is a link with WP3.

Work package 5: Variable and model selection and the study of(ultra-)high dimensional data
Main goal: The number of observed characteristics (variables) can be large to high toultra-high, when compared to the number of individuals (subjects) for which these vari-ables are measured. Methods that automatically can select the important characteristics(possibly of diﬀerent nature) in complex systems need to be developed, also for situationsfor which the model adopted speciﬁes that the number of characteristics grows very fast(even at a polynomial rate) with the number of subjects. Even for a set of selected char-acteristics, several models may be plausible, and model selection comes into play.

Major objectives:
(1) *developing efficient and fast techniques for variable selection in a high dimensional*
(2) *developing efficient variable selection methods in flexible regression models, including*
*robust selection procedures;*
(3) *developing methods for combining multiple sorts of information from the same system:*
*defining objective functions, constructing algorithms, determining optimal weightsfor different data blocks;*
(4) *developing model selection methods for graphical models;*
(5) *constructing new model complexity measures; and developing model selection proce-*
*dures taking into account qualitative data patterns;*
(6) *developing method for statistical inference for high-dimensional random matrices using*
*Le Cam's approach.*
We next describe the obtained research results.

Developing efficient and fast techniques for variable selection in a highdimensional setting
Variable selection problems occur often in a bioinformatics setting. Sikorska, Lesaﬀre,Groenen, and Eilers (2013) showed how in genome-wide association studies a huge numberof SNPs can be tested for their association with disease in a computationally eﬃcient wayusing matrix operations in pure R code. In a similar setting, Sikorska, Rivadeneira *et al.*

(2013) proposed a conditional two-step approach to explore the longitudinal relationshipbetween the trait and the SNP. When some prior knowledge on a certain important setof variables is available, a natural assessment on the relative importance of the otherpredictors can be based on their conditional contributions to the response given the knownset of variables. Barut *et al. *(2013) proposed such a conditional screening technique ingeneralized linear models. A variable selection technique in the speciﬁc case of mixturecure models was developed by Dirick *et al. *(2013).

In a clinical setting, Varewyck *et al. *(2014) evaluated the quality of hospital care in
settings with a large number of hospitals and a large set of baseline patient characteristics
using the Firth corrected ﬁxed eﬀects regression models. It was shown how this frequentistbias correction on the scale of the estimating equations leads to stable convergence as wellas better power in realistic hospital settings.

In the context of personalized medicine, Doove *et al. *(in press) contributed an in-
depth review of a family of ﬁve recursive partitioning methods that have been recentlydeveloped for the selection of subgroups in randomized clinical trials. Dusseldorp and VanMechelen (2014) proposed a novel member of this family that focuses on so-called qualita-tive treatment-subgroup interactions (i.e., for some subgroups of persons one treatment isbetter than the other while for other subgroups the reverse is true), which are of particularrelevance for optimal treatment assignment.

Developing efficient variable selection methods in flexible regressionmodels, including robust selection procedures
Flexible regression models based on P-splines in the context of variable selection methodswere reviewed by Gijbels, Verhasselt and Vrinssen (2014). Generalized additive models(GAMs) were developed for discovering quantitative traits using bulk segregant analysiswith next generation sequencing (Claesen *et al. *2013). The methods were applied andfurther reﬁned for ﬁnding major and minor QTLs that contribute to thermo-tolerance inyeast (Yang *et al. *2013). Instead of polynomial interpolation, Jansen (2013) and Jansen(2014a,b) developed a scheme for sparse multiscale representation of data on irregularpoint sets, using statistical smoothing.

An important research topic in this section is dealing with robust techniques. Croux
*et al. *(2013) and Oellerer *et al. *(2014) proposed and studied the robustness of the sparseleast trimmed squares estimator and other penalized robust regression methods by com-puting the inﬂuence function. The asymptotic biasedness of the estimators makes thecalculations non standard. Furthermore, Oellerer *et al. *(2013) proposed a robust andsparse regression estimator, called the shooting S, that can cope with componentwiseoutliers in the explanatory variables. Van Aelst (2014) developed a robust procedure toestimate the center of high dimensional data by reducing the eﬀect of both contaminatedcells and structural outliers. Tharmaratnam and Claeskens (2013) studied variable selec-tion methods using robust M, S and MM estimators. Focused model selection for quantileregression was the topic of Behl *et al. *(2014), with an example on minimum eﬀective dosedetermination. An overview of model selection methods was given in the encyclopediacontribution by Claeskens and Jansen (2014).

Another robust procedure was developed by Gijbels and Vrinssen (2013) for a multiple
regression model based on a speciﬁc adaptive lasso method. Sabbe *et al. *(2013) proposeda lasso logistic regression method in combination with an EM algorithm so as to allowfor a correct model selection in the presence of a missing data at random mechanism. Anapplication to a study on the development of dysphagia following radiotherapy for headand neck cancer was published by De Ruyck *et al. *(2013).

In a on nonparametric setting, De Neve *et al. *(2013b) adapted the Wilcoxon-Mann-
Whitney test for variable selection in a high dimensional context and they demonstratedthe method for RT-qPCR experiments. The method was also implemented as an R package(De Neve *et al.*, 2014). Similar high dimensional hypotheses testing problem occur inepigenomics. Mensaert *et al. *(2014) gave an overview of the available data analysesmethods available today. Van Aelst and Willems (2013) developed routines and R softwareto estimate to perform robust inference and model selection based on the nonparametricbootstrap.

Developing methods for combining multiple sorts of information fromthe same system
Ballings' (2014) PhD dissertation focused on ensemble methods. Special attention goes tomodel fusion, an analogue variant of model selection. De Roover, Ceulemans *et al. *(2013)contributed a valuable clusterwise extension of the family of simultaneous component anal-ysis (SCA) methods (Ceulemans, Hubert and Rousseeuw, 2013), which implies a clusteringof the data blocks along with an SCA model per cluster. The same principle of cluster-wise extension was also successfully applied by Wilderjans and Ceulemans (2013) and DeRoover, Timmerman, Van Mechelen and Ceulemans (2013) to the PARAFAC model formultiway data. Somewhat relatedly, also a novel clusterwise SCA variant with commonand cluster-speciﬁc components was proposed by De Roover, Timmerman, Mesquita andCeulemans (2013). Van Deun *et al. *(2013) further made an in-depth theoretical andempirical comparison of no less than ﬁve diﬀerent methods to identify common and/ordistinctive mechanisms underlying multiblock data.

Developing model selection methods for graphical models
Pircalabelu, Claeskens and Waldorp (2013) developed model selection methods for graph-ical models (see also Pircalabelu, Claeskens, Jahfari and Waldorp (2014) and Pircalabelu,Claeskens and Waldorp (2014)).

Constructing new model complexity measures
Information criteria for model selection for copula models were addressed by Geerdens *etal. *(2014) and a further overview of lack-of-ﬁt tests and diagnostics for multilevel modelswas given in Claeskens (2013).

Multivariate relative dispersion measures (i.e. generalizations of the coeﬃcient of vari-
ation to the multivariate setting) were investigated by Aerts *et al. *(2014a, 2014b) bymeans of inﬂuence functions.

Noh *et al. *(2013a, 2013b) developed methods based on the coeﬃcient of determination
to quantile regression models to assess model misspeciﬁcation.

Developing methods for statistical inference for high-dimensional ran-dom matrices using LE Cam's approach
Testing for sphericity is one of the central problems in inference for high-dimensional data.

While most contributions consist in providing testing procedures that are asymptoticallyvalid under high-dimensional asymptotics, the power and eﬃciency properties of such testsremain unexplored. Onatski *et al. *(2013, 2014) provided the ﬁrst Le Cam approach tothis problem, by investigating local asymptotics under the so-called *spiked alternatives*, inthe n/p → c case.

Interactions with other packages
The work on blind deconvolution under positivity constraints, relates to WP4. All worklisted in Section 3.5.2 is also relevant for WP4. The study of high-dimensional covariancematrices as discussed in Section 3.5.5 is of immediate relevance for WP2, where the aspectis overall more on the time dynamics. Some of the research in this package is based onsurvival analysis, which has a natural link with WP3.

Meta Work package. The developed statistical methods in full use
Main goal: The developed statistical methods in full use. In this meta work package weaim, through continuous interactions with the other work packages, at answering speciﬁcquestions in focused application areas, in particular in econometrics, biomedical sciences,human and natural sciences. These questions on the one hand stand for the motivationof the research questions and on the other hand serve at demonstrating the impact of theplanned research on application areas.

Major objectives:
(1) *Understanding the dynamic aspects of affective disorders;*
(2) *Predicting dynamic behavior of economics;*
(3) *Infectious disease epidemiology;*
(4) *Quality and safety in food production;*
(5) *Data integration in systems biology.*
We next describe the obtained research results.

Understanding the dynamic aspects of affective disorders
The research contributed to two new and signiﬁcant insights into the study of depression.

Firstly, making use of vector autoregressive models, Pe *et al. *(in press) showed that theemotion network of people suﬀering from Major Depressive Disorder is more dense (i.e.,displays a higher level of coherence over time). Secondly, making use of a dynamic modelbased on Lotka-Volterra equations, van de Leemput *et al. *(2014) revealed that mood
systems may have tipping points, with mood dynamics in the proximity of these tippingpoints becoming subject to a phenomenon called critical slowing down; this implies animportant early warning signal for the onset of depression.

A second noteworthy achievement pertains to the time dynamics of maladaptive anger
regulation: Using a tailor-made non-standard clustering methodology, Heylen *et al. *(inpress) were able to show how maladaptive regulation relates to both the amplitude andthe shape of anger intensity proﬁles across time.

Other signiﬁcant results include new insights into the time-dynamic relation between
valence and arousal (as obtained by means of a ﬂexible nonparametric regression method-ology by Kuppens *et al.*, 2013) and the successful development by Bulteel *et al. *(2014) ofa tool for the study of intensive, within-person data on emotional responding. The lattertool, which is based on local robust PCA and outlier detection methods, allows the userto identify the timing and nature of latent changes in the mean level and covariation ofmultiple aﬀective response channels.

Predicting dynamic behavior of economics
D'Agostino *et al. *(2013) assesses whether explicitly modeling structural change increasesthe accuracy of macroeconomic forecasts. It produces real time out-of-sample forecasts forinﬂation, the unemployment rate and the interest rate using a Time-Varying CoeﬃcientsVAR with Stochastic Volatility (TV-VAR) for the US. The model generates accuratepredictions for the three variables. In particular for inﬂation the TV-VAR outperforms,in terms of mean square forecast error, all the competing models: ﬁxed coeﬃcients VARs,Time-Varying ARs and the naive random walk model. These results are also shown tohold over the most recent period in which it has been hard to forecast inﬂation.

Infectious disease epidemiology
Within the area of infectious disease epidemiology focus has been on: (1) The developmentand use of ﬂexible regression models (link with WP1) to study prescriber determinants ofantibiotics (Blommaert *et al.*, 2013), temporal patterns of inﬂuenza like illness (Bollaerts*et al.*,2013; Vandendijck *et al.*, 2013) and evolutions in co-payment for common medicationin Belgium (Fraeyman *et al.*, 2013); (2) The development and use of (ﬂexible) hierarchicalmodels (link with WP3) to study antibiotic use in Belgium (Minalu Ayele *et al.*, 2013)and maternal mortality in Mozambique (Loquiha *et al.*, 2013); (3) The development anduse of various statistical methods to tackle diﬀerent research questions in infectious diseaseepidemiology (see e.g. Andraud *et al.*, 2013; Castro-Sanchez *et al.*, 2013; Potter and Hens,2013; Van Kerckhove *et al.*, 2013).

Aregay *et al. *(2014) develop prediction methods in the setting of long-term vaccination
The use of state of the art and/or newly developed statistical techniques might lead
to new insights in the disease dynamics of malaria. More speciﬁcally, Yewhalaw *et al.*
(2013) investigated the eﬀect of a hydro-electric dam in Ethiopia on malaria incidence. Itwas demonstrated that the dam itself does not impact on malaria incidence: during thedry season, transmission and mosquito abundance is low anyway, whereas in the rainyseason the mosquito abundance is so high that the dam does not have an impact as extrabreeding ground for the mosquitos. Furthermore, Yehenew *et al. *(2013) demonstratedthat diﬀerent sampling schemes having equal power can be used. For the current study,more than 2000 children were followed up on a weekly basis, which could eventually leadto community fatigue. Alternatively, more children could be followed on a monthly basisinstead, leading to similar power under simple model conditions.

Quality and safety in food production
Rodr´ıguez *et al. *(in press) showed, by means of suitably applied multiway dimensionreduction procedures, how a fast and eﬃcient food quality control could be achieved usingelectronic noses.

In food industry, the quality of meat produced for human consumption very much
depends on conditions under which animals were kept and/or transported. Studies in thiscontext are often highly hierarchical since measurements are taken at various levels (unitlevel, batch level, .), and this should be accounted for in the analysis. Permentier *et al.*

(2013) showed a relation between diet and carcass measures at slaughter of crossbreds.

Other applications
Applications in finance and managementHambuckers and Heuchenne (2014) model ﬁnancial time series with heavy-tailed distri-butions: they compare sinh-arcsinh and generalized hyperbolic distributions computedon stock indices. Their model assumes a multiplicative heteroscedastic structure with anonparametrically estimated conditional variance.

In the study of control charts and their applications in management science, Faraz *et*
*al. *(2013) improved the power of variable ratio sampling control charts by using run ruleswhile Faraz *et al. *(2014) proposed a Shewhart-style control charting strategy (namedbundling) which describes how to create and average related time variables to allow fora reduction in the number of charts needed for monitoring delivery chain systems. Next,Celano *et al. *(2013) developed a control chart that monitors the social quality loss toﬁnal customers. The advantage of this chart is that processes are monitored according tospeciﬁcation and upper control limits that are derived from the total price of a productin a market, which allows for a targeting cost philosophy. Seif *et al. *(2014a) evaluatedthe performance of the multiple variable sampling intervals scheme. Seif *et al. *(2014b)investigated the consequences of non-normality on adaptive sampling schemes enabling tomonitor processes for example based on heavy tails distributions.

Applications in social sciences and health sciencesVan der Elst *et al. *(2014) examines the use of various statistical methods towards theanalysis of repeated cognition data.

The study of antimicrobial resistance has become one of the main public health burdens
of the last decades, and monitoring the development and spread of non-wild-type isolateshas therefore gained increased interest. Monitoring is performed, based on the minimuminhibitory concentration (MIC) values, which are collected through the application of di-lution experiments. For a given antimicrobial, it is common practice to dichotomize theobtained MIC distribution according to a cut-oﬀ value, in order to distinguish betweensusceptible wild-type isolates and non-wild-type isolates exhibiting reduced susceptibilityto the substance. However, this approach hampers the ability to further study the char-acteristics of the non-wild type component of the distribution as information on the MICdistribution above the cut-oﬀ value is lost. As an alternative, new methods based on mix-ture models are presented, allowing the estimation of the full continuous MIC distribution,thereby taking all available information into account. In current research these modelsare extended to the multivariate setting of multi- and co-resistance, and to trend-modelsto examine possible time trends.

Minalu *et al. *(2013) use for change-points in mixed models to model antibiotic use.

To optimize the planning of blood donations but also to continue motivating the vol-
unteers it is important to streamline the practical organization of the timing of donations.

While donors are asked to return for donation after a suitable period, still a relevant pro-portion of blood donors is deferred from donation each year due to a too low hemoglobinlevel. Rejection of donation may demotivate the candidate donor and implies an ineﬃcientplanning of the donation process. Hence, it is important to predict the future hemoglobinlevel to improve the planning of donors visits to the blood bank. Nasserinejad *et al.*

(2013) showed that transition models and mixed eﬀects models can help in optimizing theplanning of blood donations. In general, the transition model provides a somewhat betterprediction than the mixed eﬀects model, especially at high visit numbers. In addition, thetransition model oﬀers a better trade-oﬀ between sensitivity and speciﬁcity when varyingthe cut-oﬀ values for eligibility in predicted values. Hence transition models make theprediction of hemoglobin level more precise and may lead to less deferral from donationin the future.

Lin *et al. *(2013) applies model selection methodology for hierarchical to the context
of genomic biomarkers, Demaerschalck *et al. *(2013) and Reynders *et al. *(2014) areapplications to public health in this respect. Tomsin *et al. *(2013, 2014) apply mixed-modelmethodology to longitudinal data collected in women with problematic pregnancies.

Complex data structures are also encountered in nursing science. Often, measure-
ments are taken at various units within a number of hospitals, and this hierarchical datastructure needs to be accounted for in the statistical analysis. Examples can be found inSchubert *et al. *(2013), Bruyneel *et al. *(2013), Van den Heede *et al.*, 2013). A context
in which between-centre variability is important to study is the devolopment of qualityof care indicators to evaluate the performance of medical units. This requires assessingbetween- and within-unit variability, adjusting for important confounding factors. See forexample Penninckx *et al. *(2013).

Applications in ecology, earth and space sciencesIn modern Microbial Ecology high throughput technologies (high dimensional data) arefrequently used to infer the eﬀects of controlled and uncontrolled environmental and ex-perimental conditions on the behavior of microbial communities, both in terms of theirabundance and functionality. Statistics and statisticians often play a crucial role in the de-sign and analysis stages of such studies. Within the scope of the StUDyS project, we haveemployed techniques which are beyond state-of-the-art methods in De Roy *et al. *(2013)and Ho *et al. *(2014). In particular, we have implemented complex optimal experimentaldesign methods and semiparametric quantile regression techniques.

Aerts *et al. *(2014) applies multiple-imputation methodology to study relationship
between astronomical observations between which complex relationships hold.

Applications in sport sciencesLiebl *et al. *(2014) propose testing procedures for diﬀerences in ankle palantarﬂexionstrengths of habitually rearfoot and forefoot runners. In order to approach this issue, theproblem of classifying diﬀerent footfall patterns in human runners is revisited. A datasetof 119 subjects running shod and barefoot (speed 3:5 m/s) was analyzed. The footfallpatterns were clustered by a novel statistical approach, which is motivated by advances inthe statistical literature on functional data analysis.

Interactions with other work packages
Due to its special nature an role, this work package is linked to all others.

Network Activities
All activities of the IAP-statistics network can be followed closely from our web site. Theaddress of the web site is
The web site contains e.g.the following information:
• A brief history of the network referring to previous phases
• Composition of the network (including list of partners, members, personnel working
under the IAP project, visitors, .)
• List of members
• A brief description of the project (work packages and main objectives)
• Research activities (seminars, meetings, short courses, workshops, .)
• Brief overview of ongoing PhD and Postdoc projects within the network
• Lists of technical reports, of publications and of books written by members of the
• Contact details
New activities (seminars, short courses, meetings, .) are announced via various (ex-
isting) mailing lists. They are also announced at the IAP web site, where a link to theappropriate web page is added for more details.

Scientific meetings
Given the large number of research activities organized by IAP-network members, mainlyin the framework of the International Year of Statistics, the organization of the AnnualWorkshop will take place in the fall 2014 (instead of in spring 2014). The ULB partnerwill organize the next workshop, preliminarily announced for November 2014.

Many meetings were organized by members of the network in the period April 2013 –March 2014. The list below is restricted to meetings (co)-organized by IAP-members intheir own country.

• *4th Simulation models of infectious diseases (SIMID) Workshop*, organised by UH, April
17 and 18, 2013.

• *2nd Dutch/Flemish Labmeeting Time Series and Dynamical Models*, at the University
of Amsterdam. Co-organized by C. Albers (RUG). May 2013.

• *New Developments in Econometrics and Time Series*, Workshop, September 12–13,
2013, at ULB, Brussels (Belgium). Co-organized by D. Giannone, M. Hallin and S.

H¨
ormann (all ULB).

• The IAP-network organized a special lecture at the 21st Annual Meeting of the Belgian
Statistical Society, October 9–11, 2013, at the University of Gent (main organizer:S. Van Aelst, UG). Invited speaker for the network at the conference was A. Ko-marek (CU), with a talk on "Clustering for multivariate continuous and discretelongitudinal data".

• *Statistics, your friend in daily life; whether you like it or not*, UCL, October 25, 2013.

Main organizer (UCL, A. El Ghouch) and co-organizers from other partner univer-sities (P. Janssen (UH), G. Molenberghs (KUL-1/UH)). Meeting organized in theframework of the International year of Statistics 2013.

• *Moving beyond questionable research practices: Symposium on good research practice in*
*behavioral sciences*, an organization by IAP-members of KUL-2 (Francis Tuerlinckxand Wolf Vanpaemel), in a co-organization with University of Gent.

• *International Hexa-Symposium on Biostatistics, Bioinformatics, and Epidemiology*, or-
ganized by I-Biostat, at UH, November 14 and 15, 2013.

Awarding a Honorary Doctorate to Anastasios Tsiatis (Professor of Biostatistics,North Carolina State University, North Carolina, USA). Laudatio by G. Verbeke(KUL-1).

• *Workshop on Applications of Modeling and Simulation in Drug Development*, organized
by I-Biostat, at UH, December 12 and 13, 2013.

• *International two-days workshop *on the occasion of "25th Anniversary of LStat", with
several sessions organized by IAP-members (KUL-1 and KUL-2), December 13 and14, 2013.

Awarding a Honorary Doctorate to Nate Silver (ESPN, USA), Laudatio by I. Gijbels(KUL-1) and G. Molenberghs (KUL-1).

Some members of the network are also (co)-organizing workshops and conferences
abroad. Below some selection of such meetings (related to topics in the IAP-network),and participated by many members of the network.

• First Workshop on *Model Selection, Nonparametrics and Dependence Modeling *of the
Working Group on "Asymptotic Theory for Multidimensional Statistics" (see alsoSection 4.4.2), July 8–9 2013, Rennes, France. Co-organized by members of theKUL-1 (G. Claeskens and I. Gijbels) and UH (P. Janssen and A. Verhasselt) group.

• *Mathematical Statistics and Limit Theorems*, Conference in honor of Paul Deheuvels,
June 21–22, 2013, Paris (France). Involved in the organization: M. Hallin (ULB).

Organization of the network: administrative meeting
An administrative meeting will be organized in the fall 2014, at the occasion of the AnnualWorkshop.

Collaborations, working groups and seminars
The IAP network is working on a broad range of research topics in statistics. Thereis a large number of scientiﬁc collaborations within the network, as can be seen fromthe list of publications (see Section 5, and in particular Subsection 5.2, where all jointpublications are collected). Below, we mention a few examples of ongoing collaborationsbetween members of diﬀerent teams of the network.

• Collaboration between KUL-1 (M. Giacofci, I. Gijbels, G. Claeskens) and ULB (M.

Jansen) on ﬂexible wavelet estimation for functional data.

• A collaboration in multivariate survival analysis, more speciﬁcally in frailty models,
exists amongst P. Janssen (UH), C. Legrand (UCL) and L. Duchateau (UG).

• J.-M. Freyermuth (KUL-1) and M. Jansen (ULB) together work on applying multiscale
local polynomial decompositions to anisotropic multidimensional data.

ormann (ULB) and D. Hlubinka and M. Huˇskov´
a (both CU) are collaborating on
functional data problems, including also change point detection in a functional datasetup. These collaborations resulted in joint publications.

• Members of KUL-1 (among others I. Gijbels) are collaborating with D. Hlubinka (CU)
on research on data depth.

• M. Timmerman (RUG) is involved in an intensive research collaboration with several
people from KUL-2 (most notably E. Ceulemans), on multiblock component analysismodels, among others. This collaboration has, for the period 2013–2014, resulted inseveral peer-reviewed publications, including one in *PloS ONE*.

• Collaboration between several members of KUL-2 with H. Kiers (RUG) on models for
data fusion.

• Collaboration between several members of KUL-2 with M. Hubert (KUL-1) on robust
dimension reduction methods.

• Members of ULG and UCL collaborate on semi-competing risks (ULG: C. Heuchenne,
S. Laurent; UCL: C. Legrand and I. Van Keilegom), as well as on semiparametrictransformation models (ULG: C. Heuchenne; UCL: I. Van Keilegom, among others).

• Members of the USC and UCL groups have been collaborating in nonparametric esti-
mation and testing methods, and location-scale progressive three-state models (thelatter involving C. Cadarso-Su´
arez, from USC; and I. Van Keilegom, from UCL).

Speciﬁcally, there is an ongoing project on directional–linear regression, which is part
of the PhD thesis by E. Garc´ıa–Portugu´es, under the supervision of W. Gonz´alez–Manteiga and R.M. Crujeiras, and was developed during a research visit at UCL(from September–December 2013), where the student worked with I. Van Keilegom.

• There are extensive collaborations between (1) UH and KUL-1 on longitudinal data,
joint modeling of longitudinal and survival data, incomplete data methodology, sen-sitivity analysis for incomplete data and clinical trial methodology; (2) UH, KUL-1,and LSTHM on longitudinal and incomplete data and sensitivity analysis; (3) be-tween UH, KUL-1, and USC on longitudinal and incomplete data, sensitivity anal-ysis and clinical trial methodology. Further topics of collaboration include epidemicmodelling using social contact data and the role of (a)symptomatic infections (UHand LSTHM) and nonparametric estimation in ﬂexible regression models and robustvariable selection techniques (UH and KUL-1).

• S. Nagy became a joint doctorate student of D. Hlubinka (co-supervisor, CU), I. Gi-
jbels (supervisor, KUL-1) and M. Hubert (co-supervisor, KUL-1) since October 1,2013. He has been awarded a PhD Scholarship from the Flemish Science Foundationenabling his long stay in Belgium.

• A number of people from the network, consisting of P. Janssen, Z. Shkeddy and R.

Braekers (UH) and L. Duchateau (UG) formed a group to coordinate a cross cut-ting initiative in Statistics sponsored by VLIR-UOS to deliver statistical courses indeveloping countries.

Below are a few examples of active working groups in the network. They are an importanttool to stimulate interactions between network partners, and to stay informed of theresearch achievements of other partners of the network.

• E. Ceulemans (KUL-2), and M. Timmerman and A. Stegeman (both RUG) are co-
chairs of the ERCIM Specialized team on multi-set and multi-way modeling, withM. Hubert (KUL-1), H. Kiers (RUG), and I. Van Mechelen (KUL-2) as members.

• F. Tuerlinckx and L. Bringmann (both KUL-2) and C. Albers and M. Timmerman
(both RUG) established a working group on time series in psychology, with a smallgroup meeting every six months.

• Members of the KUL-1, UCL and UH group are part of a scientiﬁc working group on
"Asymptotic Theory for Multidimensional Statistics". One of the aims of this groupis bringing together researchers working on asymptotic theory and mathematicalstatistics, and to stimulate collaborations in developing statistical theory.

• Members of the KUL-1 and UH group are part of a scientiﬁc working group on
"Longitudinal Data Analysis and Missing Data". One of the aims of this group is
bringing together researchers working on aspects related to longitudinal data anal-ysis, possibly subject to dropout, mixed models and sensitivity analysis.

• Members of the KUL-1 and UH groups are part of a scientiﬁc working group on
"Sensitivity, Surrogacy and Hierarchical Data Meeting". One of the aims of thisgroup is bringing together researchers working on correlated, multivariate, longitu-dinal data, and to stimulate collaborations in developing statistical methods, throughdiscussion sessions and informal talks. The meetings are also frequently visited bycolleagues from the Erasmus Universiteit Rotterdam, Maastricht University, and theInternational Drug Development Institute, among others.

• Members of UCL and ULG (C. Heuchenne and P. Lambert) are parts of a scientiﬁc
working group on "Semiparametric inference for survival and cure models".

• D. Magis (ULG) is member of the Research Group of Quantitative Psychology and
Individual Diﬀerences at KU Leuven, collaborating on research regarding DiﬀerentialItem Functioning.

Each of the participating partners organizes on a regular basis statistics seminars at theiruniversities. Announcements of these seminars are sent out to most Belgian statisticians,including those participating in the network.

Apart from the regular statistics seminars at the universities involved, several seminars
have been organized by the network itself, e.g. around central themes of the network. Theyare on some occasions given by members of the network (at a diﬀerent partner university),in order to foster research interactions and exchange of ideas. For brevity we only listthese seminars below.

• April 14, 2013: Christel Faes (UH), "Spatial disease mapping based on survey data: the
Health Interview Survey in Belgium", at UCL, Belgium.

• November 7, 2013: Marek Omelka (CU), "Testing for homogeneity of multivariate dis-
persions using dissimilarity measures", at the KU Leuven, Belgium.

• November 8, 2013: Davy Paindaveine (ULB), "On statistical depth and its local exten-
sions", at the UCL. Belgium.

• November 15, 2013: Daniel Hlubinka (CU), "Data Depth and Functional Data", at
ULB, Belgium.

• November 21, 2013: Daniel Hlubinka (CU), "Weighted halfspace data depths", at KUL-
• November 23, 2013: Eduardo Garc´ıa Portugu´es (USC), "Smoothing-based tests for
directional and linear data", at UCL, Belgium.

• November 28, 2013: Roel Braekers (UH), "Extending the Archimedean copula method-
ology to model multivariate survival data grouped in clusters of variable size", KUL-1, Belgium.

• February 14, 2014: Yvik Swan (ULG), "Entropy and the fourth moment phenomenon",
at UCL, Belgium.

• March 5, 2014: Marc Aerts (UH), "Constrained generalised proﬁling inference for
biomathematical models, with applications in microbial risk assessment", at CU,Czech Republic.

• March 7, 2014: Dominik Liebl (ULB), "Modelling Electricity Prices as Functional Data
on Random Domains", at UCL, Belgium.

• March 7, 2014: Laure Sansonnet (UCL): "A model of Poissonian interactions: detection
of dependence and nonparametric estimation", at ULB, Belgium.

For all other seminars on IAP-network topics organized during the reporting period,
we refer the reader to the web site.

Several short (intensive) courses have been organized. These courses are intended for allmembers of the network, and in particular (but not exclusively) for the PhD-studentsand the postdoctoral researchers. The announcements were posted on the web site, andsent out via existing mailing lists. No (or reduced) registration fees were required forIAP-members.

A list of the short courses organized during the period April 2013 – March 2014 is
given below.

• May 21–22, 2013. Short course on "Copula-based Dependence Models", given by E.F.

Acar (University of Manitoba, Canada), at UCL.

• August 3–4, 2013. Short course on "Foundations on recent advances in longitudinal and
incomplete data and in joint modeling". Organized at the Joint Statistical Meetings,Montreal, Canada. Lectures given by G. Molenberghs (UH), G. Verbeke (KUL-1),and D. Rizopoulos (Erasmus Medical Center, Rotterdam).

• September 17–21, 2013. ECAS'2013 course on "Functional and Complex Structure Data
Analysis". The ECAS 2013 session gave a general overview of the methodological andpractical aspects of Functional Data Analysis (FDA). The course took place at CastroUrdiales (Cantabria, Spain) with lectures by A. Cuevas (Universidad Aut´
Madrid, Spain), M. Febrero (USC), P. Hall (University of Melbourne, Australia) andV. Panaretos (Ecole Polytechnique F´ed´erale de Lausanne, Switzerland).

• September 23–27, 2013. Course in "Modelling Infectious Diseases and Health Economic
Evaluation of Vaccines". Lectures by N. Hens and N. Goeyvaerts (UH), and furtherP. Beutels (UA), J. Bilcke (UA), and M. Andraud (UA).

• November 5, 2013. Short course on "Dimension reduction in regression", given by F.

Portier (UCL), Belgium, at UCL.

• November 21, 2013. Short course on "Using Machine Learning and Bayesian Methods
to Analyze Large Supernovae Datasets", given by Dr. M. Varaghese (University ofCape Town, South Africa), at UCL.

• November 2013. Short course on "Causal inference in epidemiology: recent method-
ological developments", given by S. Van Steelandt (UG), at the on London Schoolof Hygiene and Tropical Medicine, London, UK.

• February 3–7, 2014. Short course on "Nonparametric Statistical Methods", by O. Thas
(UG), at the University of Wollongong, Australia.

Often short courses (in particular of advanced type), organized on a regular basis (for
example as part of a continuing education programme, or any other regular programme)are followed by PhD students from the network. We list a few examples of such courses:
• November 21-22 and 28-29, 2013: Short course "Optimization and numerical methods
in statistics: Concepts, models, and applications". The course took place at KULeuven and was given by F. Tuerlinckx (KUL-2), G. Molenberghs (KUL-1/UH), K.

van Deun (KUL-2), and T. Wilderjans (KUL-2).

• April 8–10, 2013: Short course on "Multilevel Analysis for Grouped and Longitudinal
Data", by L. Wijngaards-de Meij (University of Utrecht, NL), at UG.

Training of young researchers: PhDs and postdocs
Guidance of young researchers
Within the network there are many close collaborations. To foster collaborations and tostimulate exchange of knowledge, several PhD students in the network have a guidancecommittee that involves professors from at least two partners of the network (e.g. thepromoter and another member of a team from one of the other partners in the network).

Below are a few examples of IAP members that were/are part of the guidance com-
mittee or were/are a member of a PhD jury at other universities of the network. Thisparticipation is a very useful way to get familiar with the research carried out at othergroups of the network, and will be extended even more in the future. We give a selectionof such participations in training in the next tables. Information on the (co)-promoters ofthe PhD research at the university at which the student is aﬃliated, is not included. Thisand more information can be found on the web site.

IAP-members is the committee(s)
(+date defense)or ongoing
G. Claeskens (KUL-1):member guidance committee
Yudhie Andriyana,
A. Verhasselt (UH): member guidance committee
Mehreteab Fantahun
Z. Shkedy (UH): co-promoter
N. Hens (UH): member PhD jury
September 27, 2013
F. Tuerlinckx (KUL-2): member PhD jury
Isabelle Charlier,
C. Heuchenne: member guidance committee
doctoral examination committee
F. Tuerlinckx (KUL-2): member PhD jury
December 20, 2013
M. Hubert (KUL-1): member PhD jury
G. Molenberghs (UH): member PhD jury
G. Molenberghs (KUL-1/UH):
member guidance committee
R. Braekers (UH): co-promoter
September 27, 2013
Clement (UG): co-promoter
O. Thas (UG): member PhD jury
I. Van Mechelen (KUL-2): member PhD jury
M. Hubert (KUL-1): member PhD jury
doctoral examination committee
December 12, 2014
Yehenew Getachew,
P. Janssen (UH): co-promoter
G. Claeskens (KUL-1): guidance committee
M. Hubert (KUL-1): guidance committee
M. Aerts (UH): co-promoter
C. Faes (UH): member PhD jury
September 27, 2013
F. Tuerlinckx (KUL-2): member PhD jury
IAP-members is the committee(s)
(+date defense)or ongoing
G. Molenberghs (KUl-1/UH):
Trevor Kadengye, KUL-2
C. Faes (UH): member PhD jury
Gonzalize Montoro, USC
Edmund Njeri Nagi,
M. Kenward (LSHTM): member PhD jury
September 26, 2013
M. Kenward (LSHTM): member PhD jury
September 26, 2013
D. Hlubinka (CU): co-promoter
Majda Talamakrouni,
I. Gijbels (KUL-1): member guidance committee
Germain Van Bever,
M. Hubert (KUL-1): member PhD jury
September 6, 2013
Robin Van Oirbeek,
F. Tuerlinckx (KUL-2): member PhD jury
Marlies Vervloet,
G. Verbeke (KUL-1): member guidance committee
A complete list of PhD theses currently in preparation in the network can be found on
Training network for young researchers
Several members of the IAP Network (including Els Goetghebeur (UG), Marc Aerts (UH),Ir ene Gijbels (KUL-1)) have cofounded and are member of the steering committee of
flames which stands for "Flemish Training Network for Methodology and Statistics".

This is an inter-university training platform, that oﬀers qualitative training in statisticsand methodology to young researchers (over all disciplines) dealing with statistical analysisin their research work. An inter-university team coordinates, develops and optimizes thestate-of-the-art training. The oﬀered training makes use of the excellent competences andexpertises of all Flemish researchers in statistics and methodology. The KUL-1, KUL-2, UG, and UH partners of the IAP network are very actively involved in this traininginitiative. Within this platform these partners collaborate intensively in setting up newinter-university (IU) training courses and evensts/activities, in which at least three partneruniversities participate actively in the organization. The activities set up within the
framework of flames during the reporting period are listed below.

• May 23, 2013. Launching flames: Flanders Training Network for Methodology and
Statistics, at the Arsenal in Brussels. With presentations by Marleen Temmerman(World Health Organization), Pieternel Verhoeven (University College Roosevelt,The Netherlands) and John Crombez (State Secretary), among others.

• September 16–20, 2013. flames Summer School (FSS) 2013, organized at the KU
• October 9, 2013. Special Flames event (PhD day) at the Annual 2013 Meeting of the
Belgian Statistical Society, at Gent. Short course by Andrew Gelman on "StatisticalEducation Part I: Tricks for Teaching Statistics" and "Part II: Interesting StatisticsExample".

• March 26–28, 2014: flames IU Course on "Theory and Practice of Questionnaire Con-
struction and Analysis", at UH.

In addition to that there many local training courses oﬀered. For more details we refer
the reader to the web site of this training platform:
Prizes or special recognition obtained by network members
Several members of the IAP network were internationally recognized for their researchwork. A selective list of such recognitions is provided below.

• Gerda Claeskens (KUL-1) was selected as Associate Editor of the *Annals of Statistics*
starting January 2013.

• Christine De Mol (ULB) was Chair of the *SIAM *Activity Group on Imaging Science
(January 2012–December 2013).

• Chella Ensoy (UH) received a best student poster award at GEOVET 2013, London.

• Domenico Giannone (ULB) was appointed as Fellow of the *Centre for European*
*Policy Research (CEPR)*.

• Ir ene Gijbels (KUL-1) was selected as Editor-in-Chief of *Journal of Nonparametric*
*Statistics*, starting January 1, 2013.

• Marc Hallin (ULB) was appointed as Co-Editor-in-Chief of *International Statistical*
*Review *(2010–); and as Editor-in-Chief of *Statistical Inference for Stochastic Pro-cesses *(2013–).

• Marc Hallin (ULB) served as a member of the *Institute of Mathematical Statistics*
(IMS) Nominating Committee (August 2012–July 2013), and was selected as a chairof that committee (August 2013–July 2014).

• Niel Hens (UH) received a best poster award at the Epidemics Conference 2013,
• Philippe Lambert (ULg-UCL) is Associate Editor of *Statistical Modelling: an Inter-*
*national Journal *since 2007; and of *Advances in Statistical Analysis *since 2009.

• Philippe Lambert (ULg-UCL) is an elected member of the Executive Committee of
the *Statistical Modelling Society *since 2013.

• Christophe Ley's Ph.D. thesis (advisor: Davy Paindaveine (ULB)) won the *Prix*
*Marie-Jeanne Laurent-Duhamel 2014 *(this prize is awarded every three years by theFrench Statistical Society to the best Ph.D. in theoretical statistics defended in aFrench-speaking university)
• David Magis is "Statistical Software Reviews Editor" for the *British Journal of*
*Mathematical and Statistical Psychology *since 2007.

• Geert Molenberghs (KUL-1/UH) is co-editor of *Biostatistics *(2010–). He is a Series
Editor for the *Wiley Series in Probability and Statistics*. He is an Editor of *StatRef*,the OnLine statistical resource of Wiley.

• Mar´ıa Oliveira, a PhD student from USC, was awarded as the Best Young Researcher
in Biostatistics, at the Spanish Biometrics Conference held in May 2013 in Ciudad–Real, Spain.

• Davy Paindaveine (ULB) was selected as Co-Editor-in-Chief of *Statistics & Proba-*
*bility Letters *(2014–).

• Francis Tuerlinckx is member of the Board of Trustees of the *Psychometric Society*
• Iven Van Mechelen (KUL-2) was President of the *International Federation of Clas-*
*sification Societies*, 2012–2013.

• Wolf Vanpaemel (KUL-2) won the 2013 William K Estes Early Career Award of the
*Society for Mathematical Psychology*.

• Geert Verbeke (KUL-1) is elected member of council of the *Royal Statistical Society*,
• The NPCirc package for R (statistical software), developed by M. Oliveira, R.M.

Crujeiras and A. Rodr´ıguez–Casal, members of the USC group, received the John
Chambers 2014 Award from the *American Statistical Association*.

In this section we provide information on the scientiﬁc output of the IAP-statistics network.

We list the publications of network members in the period April 2013 – March 2014. Thepublication lists are restricted to (peer-reviewed) publications in international journals.

Refereed publications: We list all published papers in international journals (oredited books) in the period April 2013 – March 2014 (with refereeing system). Wemake the distinction between published papers and papers in press. See also theIAP-Statistics List of Published Papers on our web site:
Each published paper has a number of the form R13xxx or R14xxx. We do notmention these reference numbers below. For each published paper the web siteprovides a link towards the published paper (for example link to the journals page).

Books: These are books written by members of the network, that are published byinternational editors. They can also be found on the webpage
(reference numbers, provided on the web site, are of the form B13xxx and BP14xxx).

In the sections below we list the research output of the IAP-network for each of the
categories described above. We start with separate lists for each partner in the network,followed by a list of publications that are co-signed by researchers from at least two dif-ferent groups from the network.

Some summary statistics: the list includes
about 400 published papers,
about 120 papers to appear,
and about 25 books or chapters in books (published or to appear).

On the web site we also provide information regarding manuscripts written in the
period April 2013 – March 2014, and *submitted for publication to an international journal*.

These lists can be found at
Each Technical Report has a number of the form TR13xxx or TR14xxx.

List of publications per team
Source: https://iap-studys.be/documents/report-2014

