Big Data Curriculum.pdf

Brief Summary

This project aims to design, implement, and evaluate a curriculum that improves knowledge about and attitudes towards big data research among medical students and professionals. The curriculum focuses on outlining big data and AI analysis principles, contrasting these with traditional statistical approaches, and illustrating their impact on the design of research questions, all within the context of triple negative breast cancer (TNBC) research. By embedding curriculum content in hands-on analyses and case discussions, the project seeks to enhance big data and AI competencies in medical education, addressing the growing need for quantitative literacy in the era of high-dimensional data.

Expanded Summary

Background and Objectives

The advent of cheaper and more diverse data generation technologies has led to an explosion of available health-related data. However, understanding and translating this big data into meaningful knowledge requires a shift in how scientific questions are framed and approached. This project recognizes that while statistics has become a core competency in medical education, big data concepts represent a new paradigm that needs to be integrated into medical curricula.

The primary objectives of this project are:

  1. Create a curriculum that outlines big data analysis principles and contrasts them with traditional statistical approaches.
  2. Embed this curriculum content in hands-on analyses and case discussions of triple negative breast cancer (TNBC) research.
  3. Evaluate the impact of this curriculum on knowledge and attitudes towards big data in medical students with a background in statistics and epidemiology.

Curriculum Design

The curriculum consists of four 3-hour lessons, each highlighting an overarching theme of big data principles and methodology:

  1. Novel data sources, high dimensionality, and limitations of traditional statistical approaches
  2. Curse of dimensionality, data structure, dimensionality reduction, and feature selection
  3. Integration of data across organizational levels, interaction, and emergent properties
  4. Model selection algorithms, local vs. global optimums, and bias-variance trade-off

Each lesson includes: