Concepts of causal inference in epidemiology have important ramifications for studies across bioinformatics and other fields of health research. In this workship, we introduce basic concepts of epidemiology, study design, and causal inference for bioinformaticians. Emphasis is placed on addressing bias and confounding as common threats to assessing a causal pathway in a variety of study design types and when using common forms of analyses such as GWAS and survival analysis. Workshop participants will have the opportunity to create their own structural causal models (DAGs) and use this model to determine how to assess an estimated causal effect. Examples using DESeq2, edgeR, and limma will be used to show how multivariable models can be fitted depending on the hypothesized causal relationship.
Presented successfully at BioC2020 to more than 100 people, updates that material by adding additional practical examples, clarifications based on participant feedback, and substantive revisions of existing examples.
- Basic knowledge of R syntax
- Familiarity with regression
Students will have the opportunity to solve toy problems and execute example code in R.
R / Bioconductor packages used
|Bias and Confounding
|Making DAGs in R
|Example using cMD
Workshop goals and objectives
- Describe the differences in common experimental and observational study designs
- Apply concepts of study design to common analyses in bioinformatics such as GWAS and survival analysis
- Understand key concepts of epidemiology such as causal inference, confounders, collidors, mediators, counterfactuals, and study designs
- Develop a structural causal diagram/directed acyclic graph (DAG) of causal relationships and assess pathways of interest
- Analyze metagenomic data using principles of causal inference to properly adjust for potential confounders
- Assess a study design in terms of causal inference
- Learn about path blocking to prevent confounding
- Create a DAG in R using daggity and ggdag
- Identify situations when multivariate adjustment for variables is inappropriate
- Specify a model based on a DAG and then fit that model to data using DESeq2, edgeR, and limma