Predicting Air Quality Index (AQI)
Using Purple Air Data
Project Information
Category: Graduate School Predictive Analytics Course Final Project
Project Date: May 4th, 2019
Project Files: Raw Data File Slide Deck
Project Code: SAS Data Base SAS 7bdat SAS EM
​
Poor air quality has negative implications on our health, lifestyle, and safety. Despite being a picturesque, rural, and small community, my hometown of Decorah, Iowa, has poor air quality. On many days, our air quality is worse than that of nearby major metropolitan areas like Minneapolis, Madison, or Chicago! Using predictive analytics techniques, I investigated if we can use the simple data readily available on any weather report (temperature, humidity, date, location) to predict air quality. I discovered that temperature and humidity impact air quality: colder dry air is more likely to be unsafe, while warm humid air is more likely to be safe. Additionally, location-specific details such as proximity to major roadways or mining operations have a large impact on localized air quality.
​
Project Requirements:
- Find a publicly available data source
- Prepare, clean, and correct the data as needed
- Formulate questions to investigate
- Use base SAS programming and SAS Enterprise Miner
- Use predictive analytics techniques to understand the data and answer questions
- Present full analysis process and final results
​
Skills Required: Base SAS, SAS Enterprise Miner, Decision Tree Models, Linear Regression, Logistic Regression, Knowledge of Statistical Concepts such as Skew, Kurtosis, and Correlation
​
Techniques & Tasks:
- Downloaded Purple Air sensor data from the local sensors
- Prepared.csv files of raw data
- Added integer and character variables where needed for analysis
- Created the "Safe" binary variable where 1= safe and 0 = unsafe air quality
- Explored and familiarized myself with the data
- Corrected data problems such as incorrect observations, missing data, and bad data
- Transformed the data by fixing skew
- Modeled using logistic regression in SAS Base and SAS EM
- Modeled using decision trees in SAS EM
- Compared models and used statistical concepts to select the strongest model
- Evaluated sources of bias
- Created a final project report and presentation