Statistics Seminars 2017 - 2018

Statistics Seminars will be in Room E0.01 Science East on Thursdays at 4.00pm.






Title:  Bayesian Nonparametric Modelling of Network Data

Speaker:  Dr. Daniele Durante.
Post-doctoral research fellow in Statistical Science
University of Padova

Date: Thursday, 5th October 2017

Time: 3pm

Location:  Room 1.25, Science Centre North (J. K. Lab)


Many fields of research provide increasingly complex data along with novel motivating applications and new methodological questions. In approaching these data sets it is fundamental to rely on parsimonious representations which make the problem tractable and provide interpretable inference procedures to draw meaningful conclusions. However, in reducing complexity, it is important to avoid restrictive models that lead to inadequate characterization of relevant patterns underlying the observed data. Within this framework, network data representing relationship structures among a set of nodes are a relevant example. Although there has been abundant focus on models for a single network, there is a lack of methods for replicated network-valued data collected from a common population distribution. These data open new avenues for studying underlying connectivity patterns, how they are distributed in the population, and if this distribution changes with predictors of interest. Motivated by neuroscience studies on brain connectivity, I will discuss some issues associated with available statistical models, and I will outline recent methods I proposed to cover some of the current gaps via Bayesian nonparametric models leveraging latent space representations.

Title:        Open and Reproducible Spatial Data Analysis in the Social Sciences – Ideas and Examples.

Speaker:      Prof. Christopher Brunsdon.
Professor of  Geocomputation, and Director of the National Centre for Geocomputation at Maynooth  University

Date:            Thursday,12th October 2017

Time:         3pm

Location:        Room 1.25, Science Centre North

The idea of reproducible research has gained much recent attention.  This is an approach to publishing reports, documents and web sites relating to data analysis in which complete information regarding the data used and the programming scripts used to perform the analysis are encapsulated in a single object.   The idea is that third parties can not only read the report but they can also reproduce any analytical results or visualisations included in the report.  This allows the scrutiny of methods used, as well as the adaptation of methods for different data sets or similar but distinct statistical analyses.

In this talk the key ideas and justifications for reproducible research will be discussed, together with a description of a practical implementation of a reproducible research framework based on the R programming language, together with RStudio and RMarkdown.   In addition to this, some examples of ongoing work using a reproducible paradigm will be given, including an open and reproducible geodemographic classification for the Republic of Ireland,   and the production of tutorial materials for Bayesian spatial modelling using the STAN package.


Title:    Cost-effectiveness modelling to inform resource allocation decision-making in healthcare: Overview and some interesting examples

Speakers:           Dr Ronan Mahon       Team Lead - Health Economic Modelling, Novartis  Dr Andrii Danyliv Senior Health Economic Modelling Manager, Novartis

Date:        Thursday,26th October 2017

Time:         3pm

Location:        Room 1.25, O’Brien Centre for Science (North)

Healthcare system decision-makers are faced with unavoidable resource allocation decisions. In order to make decisions rationally and consistently, an evaluation framework has been developed, whereby the incremental benefit of a new health intervention is weighed against its opportunity cost (i.e. the benefit forgone elsewhere when an investment is made). In particular, cost-effectiveness models are used to estimate the incremental benefit and incremental cost associated with the new treatment. Such cost-effectiveness models have become increasingly mathematically complex and employ techniques such as Markov modelling and survival analysis in order to simulate disease progression. In this presentation, we give an overview of the field and focus on the use of survival analysis in cost-effectiveness modelling, which we hope will be of interest to those in the field of Actuarial Science.


Title:        Polynomial correlations
(joint work with A.J. McNeil  (York) and J. Nešlehová (McGill) ).

Speakers:      Andrew D. Smith HFIA School of Mathematics and Statistics. UCD.

Date:            Thursday, 2nd November 2017

Time:         3pm

Location:        Room 1.25, O’Brien Centre for Science (North)


Mathematically, the correlation of a bivariate random variable is the expected product of two polynomials. These polynomials are the first of two respective polynomial sequences, each of which is an orthonormal sequence with respect to the marginal distributions. This paper shows how expected products of higher order orthonormal polynomials can capture further aspects of dependence. These include convexity, the tendency for high values of one variable to be associated with extreme (high or low) values of another, and arachnitude which is the tendency of extreme (high or low) values of two variables to occur simultaneously We also develop rank versions of these quantities. Alongside the familiar rank correlation, we have rank convexity and rank arachnitude. Sample versions of these statistics can be useful in the calibration of copulas.


Title:        What role for volcanoes in 21st Century climate?

Speakers:     Prof. Peter Thorne.    Director, Irish Climate Analysis and Research Units Department of Geography, Maynooth University

Date:        Thursday, 9th November 2017

Time:         3pm

Location:        Room 1.25, O’Brien Centre for Science (North)


Volcanoes are the wildcard of climate. We know they shall occur and that they have potentially substantial impacts. But, how do we plan for events that are inherently unpredictable, at least in the specifics? This talk shall present a novel approach using a large multi-member ensemble and long ice-core records to attempt to provide a basis for decision makers to incorporate potential volcanic activity into adaptation planning.


Title:          Applied Statistics and Animal Breeding Programmes

Speakers:     Assoc. Professor Alan Fahey School of Agriculture and Food Science, UCD

Date:    Thursday, 16th November 2017

Time:    3pm

Location:    Room 1.25, O’Brien Centre for Science (Nth)


Animal breeding is also known as quantitative genetics or statistical genetics and is used by agriculture livestock industries to exploit the genetic diversity of economically important traits. Animal Breeding is based on the infinitesimal model which assumes that traits are controlled by alleles at an infinite number of loci, and these loci have an equal but small effect on a quantitative trait. Therefore, animal breeders pay little attention to individual genes associated with traits. The main objective of an applied animal breeding selection programme is to cause genetic improvement in traits that move the population towards a predefined breed goal. To do this, genetic evaluations must be conducted to determine the genetic merit of each animal in the population. The genetic merit of an animal is known as the estimated breeding value (EBV) and half of this value is transmitted to the animal’s progeny as each animal contributes 50% of its gene to its progeny. A successful animal breeding programme relies on a successful data collection programme at national and international level. Data is required on the animals’ phenotypes, pedigree, DNA (if available) and farm management data. The phenotype is a combination of its genotype, environment and the interaction between the genotype and the environment. Animal breeders are primarily concerned with estimating the transmittable genotype (additive genotype) and use statistical techniques such as best linear unbiased prediction (BLUP and GBLUP) to simultaneously estimated the genotypic and environmental effects. The variance of the additive genotype is then used to estimate the heritability of the trait, which is the portion of the phenotypic variance that is due to the additive genetic variance. The heritability is then used in the estimation of the breeding value.   The EBVs are used by farmers when making genetic selection decisions.  The profitability of livestock enterprises depends on more than one trait and therefore selection indices have been used to aide in multiple trait selection. These indices consider that traits are not of equal economic important, and that traits in the index can have positive, negative or genetic correlations with each other. Advances in genomic technologies and statistical methodologies haveplayed an important role in the genetic improvement of livestock populations.

 Title:    Spectral Backtests of Forecast Distributions with  Application to Risk Management

Speakers:     Alexander J. McNeil (University of York)     School of Agriculture and Food Science, UCD

Date:        Thursday, 23rd November 2017

Time:     3pm

Location:        Room 1.25, O’Brien Centre for Science (North)

In this talk we study a class of backtests for forecast distributions in which the test statistic  is a spectral transformation that weights exceedance events by a function of the modelled probability level. The choice of the kernel function makes explicit the user's priorities for model performance.  
The class of spectral backtests includes tests of unconditional coverage and tests of conditional coverage. We show how the class embeds a wide variety of backtests in the existing literature, and propose novel variants as well.
We assess the size and power of the backtests in realistic sample sizes, and in particular demonstrate the tradeoff between power and specificity in validating quantile forecasts.

Title:        Modelling the Effect of Age on Human Performance

Speakers:      Richard De Veaux (Williams College, MA)

Date:        Thursday, 7th December 2017

Time:     3pm

Location:    Room 1.25, O’Brien Centre for Science (North)


The past fifty years have witnessed an incredible improvement in the sport performances of "masters" athletes – those over 35 years old. Men and women as old as 105 have been pushing the envelope of what's possible for their age group every year in running and swimming events. Using hundred of thousands of individual events from the US masters swimming organization and several running events, I will attempt to model the effect of age on performance that one can expect from age 35 to 105 in various events and distances in both running and swimming.
The Dipsea is a 100-year old 8-mile run that starts in Mill Valley CA and ends at the Pacific Ocean near Stinson Beach. What makes the event unique is its handicap system. Each age and gender gets a different group designation. For example, the slowest group, the AAA group, is comprised of men 74 years old and older, boys 6 and under, women 66 and older, and girls 7 and under. What makes the event unique is that each group actually starts at different times ahead of the “scratch group” – men 19-30 years old. So first to leave, at 8:30 AM, is the AAA group. Next is the AA group, etc. Finally, 25 minutes later, the 19-30 year old men get to go. The winner is the one who crosses the finish line first.
But, what's a fair age and sex handicap? I will compare my model estimates to the ones actually used by the Dipsea to see which is fairer.

Title: Sequences, dissimilarities and model-based approaches.

Speakers: Prof. Rafaella Piccareta.  Department of Decision Sciences, Bocconi University, Milan

Date: Thursday Jan 25, 3pm

Time: 3pm

Location: E2.18 Science Centre East

We consider the case when for a sample of subjects the activities (or states) experienced over a period of time are tracked, so that a trajectory, i.e. a finite sequence or ordered collection of states, is available for each subject. There are many applications when data of this type can be of interest. For example, in sociology, one may be interested in studying the transition to adulthood of individuals with respect to union and family formation, or to employment. In health studies, the conditions of individuals are typically observed over time; in each period, one records whether the patient experiences focal event such as remission, occurrence of a disease, various degrees of severity of a disease, complications, or death. Adopting a “holistic” approach, we focus on the evolution of the trajectory as a whole, rather than just on the timing or occurrence of specific events. We are interested in studying the relationships that may exist between the trajectories and a set of covariates, a problem that is of course relevant in many contexts.

There is increasing attention and interest in the literature on the use of model‐based approached at this aim. Examples go from event history models, to latent class analysis, to hidden Markov Models. The idea is that the complex problem of studying the whole sequences can be efficiently simplified focusing on the transitions across states.

Irrespective of the approach followed to study the trajectories’ evolution, the assessment of the results and, possibly, model selection are usually based upon criteria depending on the hypothesized data generating process. The different assumptions underlying different models typically make the comparison of their results difficult. A related, crucial issue is to evaluate the models’ performance with respect to the original object of interest, which is the set of the observed trajectories.
Here, we propose to use simulated trajectories to study and compare the in‐sample and the out of‐sample predictive power of competing models, that is their ability to generate trajectories that are “similar” to the observed ones. Our aim is to introduce criteria to suitably compare collections of dissimilarities computed across observed and model‐generated sequences.

We explore a few dissimilarity‐based methods to assess the relative merits of competing models, when applied to the same data.

Title:        Lateral trait transfer in phylogenetic inference

Speakers:      Dr. Luke Kelly. (Oxford University).

Date:        Thursday Feb 1st, 3pm

Time:     3pm

Location:    Room 1.25, O’Brien Centre for Science, North

We are interested in inferring the phylogeny, or shared ancestry, of a set of taxa descended from a common ancestor. When traits pass vertically through ancestral relationships, the phylogeny is a tree and one can often compute the likelihood efficiently through recursions. Lateral trait transfer is a form of reticulate evolutionary activity whereby evolving species exchange traits outside of ancestral relationships. To address this frequent source of model misspecification, we propose a novel model for species diversification which explicitly controls for the effect of laterally transferred traits.

The parameters of our likelihood are the solution of a sequence of differential equations over a phylogeny and the computational cost of this calculation is exponential in the number of taxa. We exploit symmetries in the differential systems and techniques from numerical analysis to build an efficient approximation scheme to reduce the computational cost of inference by an order of magnitude while remaining exact in a pseudo-marginal MCMC sense. We illustrate our method on a data set of lexical traits in Eastern Polynesian languages and demonstrate a significantly improved fit over the corresponding method which ignores lateral transfer.

Title:      Strategies for efficient Bayesian inference for Gibbs random fields.

Speakers:      Prof. Nial Friel, School of Mathematics and Statistics, UCD

Date:        Thursday Feb 8th, 3pm

Time:     3pm

Location:   JK Lab, First Floor Science Centre North


Abstract: Gibbs random fields are a popular class of statistical model for network and also spatial data. They have widespread use in many areas including the social sciences, physics, ecology and many more. However, they present difficulties for statistical inference, because the resulting likelihood is usually intractable. I will present a collection of  strategies for overcoming this intractability all of which rely on the fact that although the likelihood cannot be evaluated, it is possible to simulate from it.

Title: Recombination and the lack of it: finding genetic factors influencing autism, and finding bacterial chromosomally clustered genes.

Date:Thursday Feb. 15th

Time: 3pm.

Venue: JK Lab Science Centre North

Speaker: Prof. Denis Shields, Clinical Bioinformatics, School of Medicine, Conway Institute, UCD


Recombination between genes in families is used in linkage mapping to work out which gene markers are chromosomally close to the disease. Here, we develop a new way of looking at the recombination information, to see if the recombinations themselves actually disrupt gene function and cause disease. We surveyed Autism Spectrum Disorder families, inferred recombination events in chromosomes and looked for clusterings in the ASD children compared to their siblings and found some suggestive regions of interest.   Recombination of genes in bacteria occurs quite differently to man, with small regions of functionally related genes hopping from one species to another. This results in clustering of genes of related function, but the existing statistical models of this use a simple distance threshold to define clusters. We fitted an exponential function to known functional groups and applied this to develop a scoring for protein domain clustering in bacteria, down-weighting more closely related species. This recovers known clusters and suggests some new possibly functionally related domains.

Title:        “Adaptive Utility: Decision Making with Uncertain Preferences”

Speakers:      Dr. Brett Houlding (TCD)

Date:        Thursday Feb 22nd, 3pm

Time:         3pm

Location:    Room 1.25, O’Brien Centre for Science, North

“Decision making with adaptive utility provides a generalisation to classical Bayesian decision theory, allowing the creation of a normative theory for decision selection when preferences are initially uncertain. In this talk I address some of the foundational issues of adaptive utility as seen from the perspective of a Bayesian statistician and the implications that such a generalisation has upon the traditional utility concepts of value of information and risk aversion.  A new concept of trial aversion is introduced that is similar to risk aversion, but which concerns a decision maker's aversion to selecting decisions with high uncertainty over resulting utility.  I conclude with some recent extensions for computationally efficient modelling before discussing current research concerning applications within recommender systems.”

Title:        An Analysis of the Taxation Supports for Private Pension Provision in Ireland

Speakers:      Maeve Hally FIA, FSAI (UCC)

Date:        Thursday 29th March

Time:     3pm

Location:    Room 1.25, O’Brien Centre for Science, North

The size and distribution of the taxation supports for private pension provision has been a contentious issue. Research produced or commissioned by representative groups of the pensions industry in Ireland maintains that the tax supports are merely tax deferment, and the effective tax relief is lower than the ‘headline’ relief on pension contributions. Research by the OECD, on the other hand, suggests that the pensions savings is essentially tax free to the majority of pension savers. This paper estimates the value of the favourable tax treatment to private pensions provision, expressed as a percentage of the original amount invested, and analyses how it varies with income level, gender, saving period, and other factors.  The net effective tax relief on pension savings on each euro invested in a private pension is estimated by comparing the increase in the present value of pension savings over the lifetime of the individual when compared to other savings.  We report that the net effective relief is higher than estimated by the widely cited industry research, and depends on the value of the pension fund at retirement.  We identify three distinct groups of individuals in the current regime of incentivising pension savings: those on low incomes who are offered no incentive, the standard rate taxpayers where the net effective tax relief is about 25-30%, and the higher rate tax payers where the net effective relief is about 31-51%.  We argue that current regressive taxation supports for pension savings should be reformed, and reformed before the proposed imminent introduction of an auto-enrolment scheme.

Title:        Overlapping mixture model for network data (manet) with covariates adjustment

Speakers:      Dr. Saverio Ranciati, Department of Statistical Sciences,  "Paolo Fortunati", University of Bologna

Date:     Thursday 19th April

Time:     3pm

Location:    Room 1.25, O’Brien Centre for Science, North

Network data often come in the shape of actor-event information, where two type of node comprise the very fabric of the network. Examples of such network are: people voting in an election, users liking or disliking media content, or, more generally, individuals - actors - attending events. Interest lies in discovering communities among these actors, based on their patterns of attendance to the considered events.  To achieve the goal, we propose an extension of the model introduced in Ranciati et al. (2017): our contribution injects covariates into the model, leveraging on parsimony for the parameters, and giving insights about the influence of such characteristics on the attendances.  We assess the performance of our approach in a simulated environment.

Title: A simple method to aggregate p-values without a priori grouping

Speaker: Prof Wiuf  University of Copenhagen.

Date: Wed 09/05/18

Time: 1-2pm

Location:  Insight@UCD (3rd floor, O'Brien Centre, UCD), ***E3.29+E3.30***


In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined spatial regions or by sliding windows. However, it is not straightforward to choose appropriate grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable.

In this talk, a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups is presented. The method is inspired by local alignment algorithms in bioinformatics and makes use of random walk theory to assess probabilities. Different ways to evaluate the significance of the aggregated variables are provided, based on theoretical considerations and resampling techniques, and it is shown that under certain assumptions the FWER is controlled in the strong sense.

The method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The performance of the method is demonstrated using simulation and real data.