D207 - Exploratory Data Analysis
Exploratory Data Analysis covers statistical principles supporting the data analytics life cycle. Students in this course compute and interpret measures of central tendency, correlations, and variation. The course introduces hypothesis testing, focusing on application for parametric tests, and addresses communication skills and tools to explain an analyst’s findings to others within an organization.
Course Analysis
For this course, I ignored most of the class materials, except for the Data Camp unit on Performing Experiments in Python and was helpful in showing me how to execute various tests in Python.
The project involved using one of two datasets from a previous course, D206 - Data Cleaning#, which were only marginally cleaner. I reused my previous cleaning code, which turned out to be unnecessary since the data I needed was unchanged. My research question and hypotheses from D206 were also reused.
After a deep dive into Dr. Sewell’s webinar videos, which were a bit disorganized, I found the guidance I needed on chi-square tests of independence in Python. This allowed me to correctly analyze the data, although I had to resubmit the project again due to a citation oversight.
Once past the chi-square test, the rest was straightforward, albeit the project’s structure was a bit redundant, especially since my findings supported the null hypothesis, leading to a recommendation of no action.
For the univariate and bivariate statistics required in the performance assessment, I created graphs for each variable and included the necessary descriptive statistics, despite the rubric not being clear on this requirement. I also added brief discussions for each section, even though it wasn’t explicitly required, to provide context for the graphs.
After correcting my approach to the chi-square test and moving away from the unnecessary alternative tests, the project became relatively straightforward.
Final Thoughts
The structure of the project felt a bit disjointed, particularly with the insertion of variable exploration that seemed out of place. However, overall I throughouly enjoyed this course and learned quite a bit in terms of statistics and hypothesis testing for datasets.