class: center, middle, inverse, title-slide # Introduction to DATA 606 ## Statistics & Probability for Data Analytics ### Jason Bryer, Ph.D. ### Spring 2025 --- # Agenda * About your instructor * Syllabus * Class meetups * Course Schedule * Assignments (how you will be graded) * Participation * Labs * Data Project * Exams * Software * The `DATA606` R Package * Using R Markdown --- # A little about me... * Assistant Professor at CUNY in Data Science and Information Systems * Principal Investigator for a Department of Education Grant to develop and test the Diagnostic Assessment and Achievement of College Skills ([www.DAACS.net](http://www.daacs.net)) * Authored over a dozen R packages including: * [likert](http://github.com/jbryer/likert) * [ShinyQDA](http://github.com/jbryer/ShinyQDA) * [DTedit](http://github.com/jbryer/DTedit) * [login](http://github.com/jbryer/login) * Specialize in propensity score methods. Three new methods/R packages developed include: * [multilevelPSA](http://github.com/jbryer/multilevelPSA) * [TriMatch](http://github.com/jbryer/TriMatch) * [PSAboot](http://github.com/jbryer/PSAboot) --- # Also a Father... <img src="images/BoysFall2019.jpg" width="65%" style="display: block; margin: auto;" /> --- # Runner... <table border='0' width='100%'><tr><td> <center><img src='images/2025DisneyMarathon.jpeg' height='450'></center> </td><td> <center><img src='images/2019NYCMarathon.jpg' height='450'></center> </td></tr></table> --- # And photographer. <img src="images/Sleeping_Empire.jpg" width="80%" style="display: block; margin: auto;" /> --- # Syllabus <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/blogdown.png" class="title-hex"> Syllabus and course materials are here: [https://spring2025.data606.net](https://spring2025.data606.net) The site is built using [Quarto](https://quarto.org) and hosted on [Github](https://github.com/jbryer/DATA606-2025-Spring). Each page of the site has a "Edit this page" link at the bottom right, use that to start a pull request on Github. We will use Brightspace primary for submitting assignments only. Please submit a PDF or link to the built HTML (e.g. Rpubs, [Github](http://htmlpreview.github.io/)) PDFs are preferred for the homework as there is some LaTeX formatting in the R markdown files. The `tineytex` R package helps with install LaTeX, but you can also install LaTeX using [MiKTeX](http://miktex.org) (for Windows) and [BasicTeX](http://www.tug.org/mactex/morepackages.html) (for Mac) See this page for more information: https://spring2025.data606.net/course-overview/software/ --- class: font90 # Meetups We will have meetups on Wednesday evenings at 8:00pm. Meetups will be recorded and made available the next day on the [course website](https://spring2024.data606.net/course-overview/meetups/). Though attending live is not strictly required, **I expect everyone to watch the lectures during the week.** I use the class meetups to convey important information and announcements. Very often I will cover some topics not in the textbook. Students who attend the meetups tend to do well on the assignments. **One Minute Papers** - Complete the one minute paper after each Meetup (whether you watch live or watch the recordings). It should take approximately one to two minutes to complete. This allows me to 1) verify you have attended/watch the meetup and 2) get feedback about what you learned and what you may still be unclear. .font60[ **Please note:** *Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the "chat" feature, which allows students to type questions and comments live.* [Click here for CUNY's camera use policy](https://www.cuny.edu/wp-content/uploads/sites/4/page-assets/academics/faculty-affairs/Camera-Use-Guidance-for-Online-and-Hybrid-Courses_FINAL-JUNE-20-2024.pdf) ] --- # Schedule <table> <thead> <tr> <th style="text-align:left;"> Start </th> <th style="text-align:left;"> End </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Saturday, January 25, 2025 </td> <td style="text-align:left;"> Sunday, February 02, 2025 </td> <td style="text-align:left;"> Chapter 1 - Intro to Data, R, and RStudio </td> </tr> <tr> <td style="text-align:left;"> Monday, February 03, 2025 </td> <td style="text-align:left;"> Sunday, February 16, 2025 </td> <td style="text-align:left;"> Chatper 2 - Summarizing Data </td> </tr> <tr> <td style="text-align:left;"> Monday, February 17, 2025 </td> <td style="text-align:left;"> Sunday, February 23, 2025 </td> <td style="text-align:left;"> Chapter 3 - Probability </td> </tr> <tr> <td style="text-align:left;"> Monday, February 24, 2025 </td> <td style="text-align:left;"> Sunday, March 02, 2025 </td> <td style="text-align:left;"> Chapter 4 - Distributions </td> </tr> <tr> <td style="text-align:left;"> Monday, March 03, 2025 </td> <td style="text-align:left;"> Sunday, March 09, 2025 </td> <td style="text-align:left;"> Chatper 5 - Foundation for Inference </td> </tr> <tr> <td style="text-align:left;"> Monday, March 10, 2025 </td> <td style="text-align:left;"> Sunday, March 16, 2025 </td> <td style="text-align:left;"> Chapter 6 - Inference for Categorical Data </td> </tr> <tr> <td style="text-align:left;"> Wednesday, March 12, 2025 </td> <td style="text-align:left;"> Sunday, March 16, 2025 </td> <td style="text-align:left;"> Midterm </td> </tr> <tr> <td style="text-align:left;"> Monday, March 17, 2025 </td> <td style="text-align:left;"> Sunday, March 23, 2025 </td> <td style="text-align:left;"> Chapter 7 - Inference for Numerical Data </td> </tr> <tr> <td style="text-align:left;"> Monday, March 24, 2025 </td> <td style="text-align:left;"> Sunday, April 06, 2025 </td> <td style="text-align:left;"> Chapter 8 - Linear Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, April 07, 2025 </td> <td style="text-align:left;"> Sunday, April 13, 2025 </td> <td style="text-align:left;"> Chapter 9 - Logistic Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, April 28, 2025 </td> <td style="text-align:left;"> Sunday, May 04, 2025 </td> <td style="text-align:left;"> Intro to Bayesian Analysis </td> </tr> <tr> <td style="text-align:left;"> Wednesday, May 07, 2025 </td> <td style="text-align:left;"> Sunday, May 11, 2025 </td> <td style="text-align:left;"> Final Exam </td> </tr> </tbody> </table> --- # Textbooks <img src="images/hex/openintro.png" class="title-hex"> .pull-left[ Diez, D.M., Barr, C.D., & Çetinkaya-Rundel, M. (2019). *OpenIntro Statistics (4th Ed)*. .font70[ This will be our primary textbook for most of the semesters. Our goal is to cover all the chapters. ] .center[ <a href = "https://github.com/jbryer/DATA606spring2024/blob/master/Resources/Textbooks/os4.pdf"><img src = 'images/openintro.jpeg' alt = 'Open Intro Statistics' height = '375px' /></a> ] ] .pull-right[ Navarro, D. (2018, version 0.6). *Learning Statistics with R* .font70[ This textbooks has a chapter on Bayesian analysis that we will use at the end of the semester. ] .center[ <a href = "https://github.com/jbryer/DATA606spring2024/blob/master/Resources/Textbooks/lsr-0.6.pdf"><img src = 'images/lsr.png' alt = 'Learning Statistics with R' height = '375px' /></a> ] ] --- # Assignments * Participation (10%) * [DAACS](https://spring2025.data606.net/assignments/daacs) * [One Minute Papers](https://spring2025.data606.net/assignments/participation) * [Labs](https://spring2025.data606.net/assignments/labs) (35%) * Labs are designed to introduce to you doing statistics with R. * Answer the questions in the main text as well as the "On Your Own" section. * [Data Project](https://spring2025.data606.net/assignments/project) (30%) * This allows you to analyze a dataset of your choosing. Projects will be shared with the class. This provides an opportunity for everyone to see different approaches to analyzing different datasets. * [Exams](https://spring2025.data606.net/assignments/exams/) * Midterm (10%) * Final exam (15%) --- # Communication * Slack Channel: https://data606spring2025.slack.com * [Click here to join the group](https://join.slack.com/t/data606spring2025/shared_invite/zt-2xcspkyh1-wBK~_M5ESx0SnlCuiZcIOQ) * Email: [jason.bryer@cuny.edu](mailto:jason.bryer@cuny.edu) * Phone/Zoom: Please email to schedule a time to meet. * Office hours by appointment. --- # Software <img src="images/hex/tinytex.png" class="title-hex"><img src="images/hex/RStudio.png" class="title-hex"><img src="images/hex/rmarkdown.png" class="title-hex"> This is an applied statistics course so we will make extensive use of the [R statistical programming language](https://www.r-project.org). Install [R](https://cran.r-project.org) and [RStudio](https://rstudio.com) on your own computer. I encourage everyone to do this at some point by the end of the semester. I have instructions on the course website here: https://spring2025.data606.net/course-overview/software/ You will also need to have [LaTeX](https://www.latex-project.org) installed as well in order to create PDFs. The [`tinytex`](https://yihui.org/tinytex/) R package helps with this process: ``` install.packages('tinytex') tinytex::install_tinytex() ``` --- # DATA 606 Package <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/devtools.png" class="title-hex"> The [`DATA606`](https://github.com/jbryer/DATA606) R package contains many data sets and functions we will use throughout the semester. It also has a `startLab` function that will copy each of the labs to your current working directory. Use the following commands to install the package (only necessary once per R installation): ``` remotes::install_github('jbryer/DATA606') ``` To start the first lab... ``` DATA606::startLab('Lab1') ``` This will copy the R markdown file and any supporting files to your current working directory. Use the "Knit" button in R Studio to build a PDF of the document. --- # Next steps... <img src="images/hex/DAACS.png" class="title-hex"> Before Wednesday (January 29th): * Complete this Google form: https://forms.gle/qxsLZnfQb9wNFPeT6 * Go to https://cuny.daacs.net and complete the self-regulated learning assessment * [Join the Slack channel](https://join.slack.com/t/data606spring2025/shared_invite/zt-2xcspkyh1-wBK~_M5ESx0SnlCuiZcIOQ) Then: * Start Lab 1 (due February 2nd) --- class: inverse, right, middle, hide-logo <!--img src="images/hex/DATA606.png" width="150px"/--> # Good luck with the semester! [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> jason.bryer@cuny.edu](mailto:jason.bryer@cuny.edu) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"></path></svg> data606spring2025.slack.com](https://data606spring2025.slack.com) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @jbryer](https://github.com/jbryer) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M433 179.11c0-97.2-63.71-125.7-63.71-125.7-62.52-28.7-228.56-28.4-290.48 0 0 0-63.72 28.5-63.72 125.7 0 115.7-6.6 259.4 105.63 289.1 40.51 10.7 75.32 13 103.33 11.4 50.81-2.8 79.32-18.1 79.32-18.1l-1.7-36.9s-36.31 11.4-77.12 10.1c-40.41-1.4-83-4.4-89.63-54a102.54 102.54 0 0 1-.9-13.9c85.63 20.9 158.65 9.1 178.75 6.7 56.12-6.7 105-41.3 111.23-72.9 9.8-49.8 9-121.5 9-121.5zm-75.12 125.2h-46.63v-114.2c0-49.7-64-51.6-64 6.9v62.5h-46.33V197c0-58.5-64-56.6-64-6.9v114.2H90.19c0-122.1-5.2-147.9 18.41-175 25.9-28.9 79.82-30.8 103.83 6.1l11.6 19.5 11.6-19.5c24.11-37.1 78.12-34.8 103.83-6.1 23.71 27.3 18.4 53 18.4 175z"></path></svg> @jbryer@vis.social](https://vis.social/@jbryer) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> spring2025.data606.net](https://spring2025.data606.net)