Growing Up in Ireland Anonymised Microdata File – Frequently Asked Questions

If you have any queries on the Growing Up in Ireland Anonymised Microdata File (AMF) please read through these frequently asked questions (FAQs) which cover the most common queries received. If your query is not answered here, please contact us on issda@ucd.ie. FAQs are grouped into four categories:

Accessing the data

Q. I am having trouble accessing data from the file you sent me. When I try to load the CD it is showing no files. I have downloaded the zip software as advised but there doesn’t appear to be any data file on the disk? Perhaps I am doing something wrong and would appreciate any guidance?

A. The disk containing wave 1 of both the child and infant cohorts for the GUI has been prepared and will in the post to you shortly. Please note the following:

The data is encrypted and password protected - this is a security measure insisted upon by the CSO, who are the ultimate owners of the data. Two versions of this encrypted data are included on the disk:

  • Zip file: this file has been encrypted using Secure Zip, and is compatible with Windows and Linux. It requires the Zip Reader (a free download from http://www.pkware.com/download-software/free-unzip) to extract the data. You will not be able to extract this data using the Windows zip utility or Winzip - you must use the Zip Reader to access the data.
  • Sitx file: this uses Stuffit to encrypt the data, and is compatible with Mac OS X and Windows. You may need the Stuffit Expander to extract the data; this is available from http://www.stuffit.com/downloads/index.html.

Information contained in the data

Q. I have noticed that some of the questions from the questionnaires are not on the Anonymised Microdata File (AMF). Is this information available?

A. The Anonymised Microdata File (AMF) is the only data lodged with ISSDA. There is Researcher Microdata File (RMF), which is more detailed. To gain access to the full researcher microdata file (RMF) applications should be directed to the Central Statistics Office (CSO). For further information please see the following page: https://www.cso.ie/en/aboutus/lgdp/csodatapolicies/dataforresearchers/rmfapplicationprocedure/

Q. What kind of information is available on the Researcher Microdata File (RMF) which is not available on the Anonymised Microdata File (AMF)?

A. There are three main categories of information which are only available on the Researcher Microdata File (RMF):

  • Identifiable information – such as school identifier, variables with categories which contain very few respondents – detailed social welfare payments, details of chronic physical or mental health problem, illness or disability
  • Sensitive information – variables which record information of a sensitive or personal nature, for example questions on drinking and drug-taking during pregnancy
  • Individual items from standardised scales, only total scores and subscale scores are provided

Q. Are the centre-based carer questionnaires data available?

A. Only the information from the home-based surveys have been archived at the moment. There are no immediate plans to archive the centre-based carer questionnaires at this time but they may be in the future.

Using the data

Q. There is so much information in the Anonymised Microdata File - how do I get started using the data?

A. The first step is to look through the paper versions the questionnaires which are available to download from the Growing Up in Ireland website http://www.growingup.ie/index.php?id=7. These are much easier to navigate than the datafile – they are divided into sections by topic and clearly show the routing of the questions.

Q. There are two versions of the file provided for the Infant Cohort, with different variable naming conventions – Convention A and Convention B – which one should I use?

A. Variable naming convention A was developed at Wave 1 and the variable names link back to the paper questionnaires. You can use this convention if you know you are only interested in doing analysis on one wave of data and do not intend to match the two waves.

If you do intend to match the two waves of data to do cross-wave analysis, variable naming convention B is better suited to this type of analysis. Variable names are not linked to the questionnaires. Any variables which are asked in both waves will have the same core variable name with wave one prefixed with ‘a’ and wave two prefixed with ‘b’.

Full details including a longitudinal data dictionary listing all variables in both waves, their convention A and B variable names and whether they are included in the AMF and/or the RMF can be found on the ISSDA website: Data Dictionary

Q. What are weighting and grossing factors?

A. Weighting and grossing factors are statistical adjustments which make the Growing Up in Ireland sample representative of the relevant population of Irish children. There was some differential response in the study, with lower educated, lower social class families for example being less likely to participate and these adjustment factors take account of this. Both give the same structural (percentage) breakdown of the target population, the weighting factor weights to the total sample size and the grossing factor grosses to the total population size. They should be used in all analysis on the data. For most analysis you will use the weighting factor. It is only if you want to report the number of children in the population that you would use the grossing factor, for example if you wanted to say that there are X number of nine-year old children in Ireland who have a chronic illness. Do not use the grossing factor for any statistical tests – all your results will be statistically significant as your statistics package will think that your sample size is the same size as the population. Details on how to use the weighting and grossing factors can be found on the Growing Up in Ireland website (Infant Cohort / Child Cohort).

Q. What are the ‘logit scores’ provided for the Drumcondra tests in Wave 1 of the Child Cohort at 9 years?

A. The logit scores should be used when analysing the data from the Drumcondra tests. They were calculated for GUI by the Educational Research Centre in St Patrick's College, Drumcondra who developed the tests. The logit score is based on EAP (expected a posteriori) scoring of the GUI data based on two parameters (difficulty and discrimination) for each item - derived from the Spring standardisations of the maths and reading tests test-wide scaling. Overlap of items between different forms (A and B) and levels (1st class through to 6th) means that the test-wide scaling gives item estimates for all pupils.

What this means is that the logit score accounts for the fact that the children in Growing Up in Ireland sat different levels of the test depending on which class they were in. The logit score allows you to compare across these different test levels and should be used in analysis.

Think of two children, one in 2nd class, one in 3rd class – they both get 50% in the reading test they sit (different test for 2nd and 3rd class). If you just look at the % correct, you might think that these children have the same achievement level, but because they sat different tests they actually have different achievement levels - 50% in the 3rd class test is a higher achievement than 50% in the 2nd class test.

Another example is think of two 3rd class children who both get 90% (36 out of 40 questions correct). The tests are designed so that some questions are harder than others and the test developers know which the difficult ones are and which the easy ones are. One child gets 4 of the easy questions wrong; the other child gets 4 of the difficult questions wrong. They both get the same % correct but the first child has a higher achievement level because they managed to answer the difficult questions correctly.

Study information

Q. My project also requires me to include details of previous ethics approvals. Does the ISSDA have these ethics approval details?

A. You may already be aware that GUI consists of two cohorts- a 'Child' cohort and an 'Infant' cohort. The first wave of data collection on the Child Cohort was carried out when the children were 9 years old and a second wave was carried out at age 13 years. The first wave of date collection on the Infant Cohort was carried out when the infants were 9 months old, a second wave was carried out at age 3 years and a third wave, at age 5 years, is currently ongoing.

Ethics approval in respect of the first wave of data collection on the Child Cohort was granted by the Health Research Board's Research Ethics Committee (HRB REC) in 2006 and 2007. The HRB REC initially gave ethical approval in respect of the pilot phase for Wave 1 (Child Cohort) on 17/11/06. This was followed up with ethical approval in respect of the school-based, main phase, on 3/4/07 and of the home-based, main phase, on 14/6/07.

An independent GUI Research Ethics Committee (REC) provided ethical approval for all waves of GUI until the end of 2022, with the exception of wave 1 which was approved by the then Health Research Board REC in 2007

As far as we are aware, ethical approval reference numbers were not used between the HRB Research Ethics Committee and the GUI Study Team. Reference was made only to the date of the approval. We can confirm that ethical approvals given to the GUI Study Team by the new Ethics Committee (convened by the Dept. of Health & Children following the resignation of the HRB REC) do not use decision numbers or references. Approvals from the new REC take the form of a letter, signed by the REC Chair, to the PI of the GUI Study Team.

Tools