Medical statistics and Data Science: Epidemiology

Directed acyclic graph (DAG)

     

News:

A new book “An introduction to Directed Acyclic Graph (DAG) for health researchers” has been published in Amazon in the 21st December 2024

An introduction to Directed Acyclic Graph (DAG) for health researchers

The book description:

"Directed acyclic graph (DAG) is increasingly used in modern epidemiology, especially guide researchers to implementing causal inference in observational studies. Casual DAG visually presents causal knowledge and assumptions between variables. Once one can manage the rules, it can facilitate many tasks, such as using DAG makes it easier to understand many concepts for example direct and indirect causal effects, mediation analysis, collider stratification bias, selection bias, and information bias, etc. It also makes easier to recognize and avoid mistakes in analytic decisions such as using the backdoor criterion to select variables to be adjusted."

"More advanced texts on DAGs are readily available in textbooks and in scientific papers, but a simple and comprehensive introduction to DAG is lacking."

"The book thoroughly introduces DAG in a plain language from the scratch, step by step with more simple and accessible language explaining the concepts, terminologies, rules, and potential applications. The book will pave the way for researchers using DAG."

Introduction to DAG

Under development

Quantify potential associations or bias for given DAG

Introduction

We know that one of the drawbacks in the use of DAGs is that DAGs show the causal relationships among variables in a qualitative manner. However, in some cases, we actually may want to quantify the causal relationships. This drawback may to some extent be overcome by using simulation methods.

Download Stata commands

We have developed two Stata commands (ancestor, child) to carry out the simulation and quantification. Type and run "ssc describe dag" in the Stata command window. You should be able to view and download the commands.

The package has been updated in the 22nd August 2019, previous version simulate only binary variables and the current can simulate normally distributed continous variables as well. Please re-install the package if you had a previous version. The downloaded commands are located under two folders: ado/plus/c and ado/plus/d

We are working on A stata command drawing a high quanlity DAG.

Steps for a single simulation and quantification

  1. Identify ancestor-variables and child-variables
  2. Simulate a dataset with all ancestor-variables
  3. Generate the child-variables
  4. Quantify the associations

Example 1: a classical triangle

Example 2: an intermediate variable

Example 3: Collider stratification bias

Example 4: M structure or M bias

  • Interpretation of the DAG:

    The DAG is so-called M bias (M structure) because the shape of the DAG looks like the capital letter M. M structure is very important because it may revolutionize the definition of the confounder. Traditionally, according to the modern epidemiology (the second edition by Kenneth J. Rothman, Timothy L. Lash,, Sander Greenland), a variable being a confounder has to meet the three criteria:

    • A confounder must be associated with the exposure under study in the source population.
    • A confounder must be a “risk factor” for the outcome (i.e., it must predict who will develop disease), though it need not actually cause the outcome.
    • The confounding factor must not be affected by the exposure or the outcome.
    If we look the DAG, whether the variable M meets the above three criteria? Yes, M does meet all the criteria, because
    • M is associated with E due to sharing the common cause U1
    • M is associated with O due to sharing the common cause U2
    • M is not an intermediate variable between E and O (M is not affected by the E or O)

    Based on the traditional criteria, if we are interested in the causal association between E and O, we should adjust for the variable M.

    On the contrary, based on the D-separate rules, M is a collider variable between E and O and the path (E ← U1 → M ← U2 → O) is blocked. Based on the back-door criteria, if we are interested in the causal association between E and O, we should Not adjust for the variable M. In case, we would have adjusted for M, we opened the path (E ← U1 → [M] ← U2 → O) that was blocked in the first place. Therefore, the adjustment for the collider M leads to over-adjustment and introduce bias, which is called collider stratification bias or collider bias.

  • Curious questions:

    Q1: Given the DAG, what the association between the variable E and O would be if the none of the variables is not adjusted for?

    Q2: Given the DAG, what the association between the variable E and O would be if the variable M is adjusted for?

    Q3: Given the DAG, what the association between the variable E and O would be if the variable M and U1 are adjusted for?

    Q4: Given the DAG, what the association between the variable E and O would be if the variable M and U2 are adjusted for?

    Q5: Given the DAG, what the association between the variable E and O would be if the variable M, U1, and U2 are adjusted for?

  • Simulation and quantification:

    Single Simulation and quantification

    We are working on this ......

    Multiple Simulations and quantification

    We are working on this ......

  • Answers to the questions:

-->

A list of examples of DAG applied to researches

The articles are ordered according to publication date. Please note that MS&DS does NOT necessarily endorse any of the articles. We would like to gradually collect as much as articles we could. If you would like articles to be added, please drop an email to feedBack@medical-statistics.dk.

Epidemiology

Epidemiology

Epidemiology