# Statistics Seminars 2018 - 2019

Statistics Seminars will be in Room 125 Science North on Thursdays at 3.00pm.

**Speaker:** Paul McNicholas - Canada Research Chair in Computational Statistics, Professor, Department of Mathematics and Statistics, McMaster University

**Title:** Clustering, classification and data science

**Date:** Thursday 11th October

**Abstract:** An overview of clustering and classification, and where they fit within data science and statistics, will be presented. A statistical, model-based, framework for clustering and classification will be discussed. Several real data examples will be used to illustrate different approaches and/or subtleties, including high-dimensional data, three-way data, and handling outliers. The talk concludes with a discussion about ongoing and future work.

**Title: ** Forward-stagewise clustering: An algorithm for convex clustering**Speaker: ** Mimi Zhang (TCD)**Date: ** Thursday 22nd November 2018**Time: ** 3pm**Location: ** Room 1.25, O’Brien Centre for Science (North)**Abstract:**

This talk presents an exceptionally simple algorithm, called forward-stagewise clustering, for convex clustering. Convex clustering has drawn recent attention since it nicely addresses the instability issue of traditional non-convex clustering methods. While existing algorithms can precisely solve convex clustering problems, they are sophisticated and produce (agglomerative) clustering paths that contain splits. This motivates us to propose an algorithm that only produces no-split clustering paths. The approach undertaken here follows the line of research initiated in the area of regression. Specifically, we apply the forward-stagewise technique to clustering problems and explain both theoretically and practically how the algorithm produces no-split clustering paths. We further suggest rules of thumb for the algorithm to be applicable to cases where clusters are non-convex. The performance of the proposed algorithm is evaluated through simulations and a real data application.

Coffee and tea will be available in the School Common Room afterwards

**Title: ** Exploring Fuzzy clustering of multivariate skew data**Speaker: ** Francesca Greselin (University of Milano Bicocca)**Date: ** Thursday 25th October 2018**Time: ** 3pm**Location:** Room 1.25, O’Brien Centre for Science (North)**Abstract:**

With the increasing availability of multivariate datasets, asymmetric structures in the data ask for more realistic assumptions, with respect to the incredibly useful paradigm given by the Gaussian distribution. Moreover, in performing ML estimation we know that a few outliers in the data can affect the estimation, hence providing unreliable inference.

Challenged by such issues, more flexible and solid tools for modeling heterogeneous skew data are needed. Our fuzzy clustering method is based on mixtures of Skew Gaussian components, endowed by the joint usage of impartial trimming and constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data and to contribute to the robust estimates of the mixture parameters.

The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters has been developed, also with the help of some heuristic tools for their choice. Finally, synthetic and real datasets are analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.

We will also show the advantages of the fuzzy approach with respect to classical model-based clustering via finite mixtures. In the former we can set the level of fuzziness and/or the percentage of units to be crisply assigned to groups, as well as the relative entropy of the solution, while in the mixture approach they arise as an uncontrolled byproduct of the estimated model.

(Joint work with Agustin Mayo Iscar and Luis Angel Garcia Escudero)

Coffee and tea will be available in the School Common Room afterwards

**Title: ** The Sparse Latent Position Model for nonnegative weighted networks**Speaker: ** Riccardo Rastelli (UCD)**Date: ** Thursday 18th October 2018**Time: ** 3pm**Location: ** Room 1.25, O’Brien Centre for Science (North)**Abstract:**

The Latent Position Model (LPM) is one of the fundamental models used in statistical network analyses. The LPM postulates that the nodes of a graph are characterised by a latent position in a Euclidean space, and that edges are created using the pairwise latent distances. The main difficulty in using these models is scalability, since estimation algorithms are generally characterised by a quadratic complexity in the number of nodes. In addition to the high computational requirements, model selection has not been addressed so far: in general, the latent number of dimensions is arbitrarily chosen. In this talk, I will introduce a new type of LPM that can be used to analyse bipartite and unipartite networks with nonnegative edge values. The proposed approach combines and adapts a number of ideas from the literature on latent variable network models. The resulting framework is able to capture important features that are generally exhibited by observed networks, such as sparsity and heavy tailed degree distributions. A fast variational Bayesian algorithm is proposed to estimate the parameters of the model. An advantage of the proposed method is that the number of latent dimensions can be automatically deduced in one single algorithmic framework, hence addressing both main research questions. Finally, applications of the proposed methodology are illustrated on artificial and real datasets, and comparisons with other existing procedures are provided.

Coffee and tea will be available in the School Common Room afterwards

## Social Media Links