R Essentials
Duration
3 days
Description
This comprehensive course is designed to equip professionals with a deep understanding of R programming for data analysis, visualization, and statistical modeling. Starting with the basics of R and progressing through data manipulation, students will explore essential data types, control structures, and powerful R packages like dplyr and ggplot2. You will learn how to handle large datasets, perform exploratory data analysis, and create custom visualizations. The course also covers inferential statistics, linear regression, multivariate analysis, and time series forecasting techniques, making it ideal for those looking to advance their analytical and programming skills using R.
Objectives
- Understand the fundamentals of R, including its comparison to other data tools like SQL, Tableau, and Excel.
- Explore the R toolchain, RStudio, and the R standard library for coding and debugging.
- Master the basics of variables, data types, and control flow in R programming.
- Work with R's core data structures: vectors, matrices, arrays, data frames, and lists.
- Utilize essential R packages for reading data, data manipulation, and exploratory analysis.
- Create and customize visualizations using R’s built-in tools and ggplot package.
- Perform statistical analysis, including inferential statistics, linear regression, and multivariate analysis.
- Apply advanced techniques like time series forecasting, pattern detection, and machine learning for predictions.
Prerequisites
No programming experience or analysis experience is required, but some programming and analysis experience is very beneficial and highly recommended.
Training Materials
All students receive comprehensive courseware covering all topics in the course. Courseware is distributed via GitHub in the form of documentation and extensive code samples. Students practice the topics covered through challenging hands-on lab exercises.
Software Requirements
Students will need a free, personal GitHub account to access the courseware. Student will need permission to install R and R Studio on their computers. If students are unable to configure a local environment, a cloud-based environment can be provided.
Outline
- Introduction
- What is R?
- What problems does R solve?
- R compared to SQL, Tableau, and Excel
- Getting Started
- R Toolchain
- Hello, R!
- Code and Debug with R Studio
- R Standard Library
- R Source Files
- Basics
- Variables
- Assignment
- Expressions
- Abstract Data Types
- Numeric
- Integer
- Character
- Logical
- Data Structures
- Vectors
- Matrices
- Arrays
- Data Frames
- Lists
- Indexing
- Factors
- Special Data Types
- NULL
- NA
- NaN
- Inf / -Inf
- Control Flow
- If / If-Else
- For
- While
- Break, Next
- Functions
- Define a Function
- Call a Function
- Function Parameters
- Return Values
- Default Parameters
- Packages
- CRAN
- Browsing Packages
- Install/Uninstall Packages
- Reading in Data
- The readr package
- Key features of readr
- Read/write delimited files
- The DBI package
- Key features of DBI
- Read/write SQL data
- Data Frames
- The dplyr package
- Explore tibbles data frames
- Managing columns
- Binning data
- Combining categorical values
- Transforming variables
- Handling missing data
- Merging and stacking datasets
- Continuous Data Exploratory Analysis
- Distributions
- Quantiles, Mean, Median
- Bi-modal distributions
- Histograms, Box-plots
- Categorical Data Exploratory Analysis
- Tables
- Barplots
- Built-in R Visualizations
- Scatterplots, histograms, barcharts, box and whiskers, dotplots
- Customize charts: titles, labels, axes, legends
- Export to PNG, JPEG, PDF, etc
- Package ggplot Visualizations
- Grammar of graphics
- Quick plots with qplot
- Building graphics step-by-step with ggplot
- Working with geometries (geoms)
- Mapping data variables to aesthetic properties
- Controlling legends and axes
- Exporting and saving graphics
- Inferential Statistics
- Bivariate correlation
- T-test and non-parametric equivalents
- Chi-squared test
- Linear Regression Models
- Understanding formulas
- Linear and logistic regression models
- Regression plots
- Handling confounding and interaction effects
- Evaluating residuals
- Predicting new data using regression models
- Useful plots for model interpretation
- Multivariate Analysis (explore selected examples)
- Correlation Analysis
- Multiple Linear Regression
- Logistic Regression
- Principal Component Analysis
- Clustering Analysis
- Multivariate Analysis of Variance
- Time Series Analysis
- Time Series Forecasting (explore selected examples)
- ARIMA
- Seasonal Decomposition
- Exponential Smoothing
- Machine Learning
- Prediction (explore selected examples)
- Logical Regression
- Machine Learning
- Pattern and Trend Detection (explore selected examples)
- Moving Averages
- Clustering
- Linear Models
- Conclusion
- Questions & Answers
- Additional Resources
- Next Steps