Predicting Wine Quality with Machine Learning

Modeling Wine Quality with Machine Learning

Wine Quality

Project Overview

The goal of our project was to compare the performance of various machine learning classification models. The data used are quality ratings (0 to 10) for red and white wine based on 11 physicochemical measurements in the wines. Red and white wine models are analyzed separately. There are 11 features for each wine type. Ten supervised machine learning models were used.

The datasets can be found here:

Red wine data.
White wine data.

The features used in the models:

fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol

Red Wine Heatmap

Looking across the quality rating row there are two features with the highest correlations. The correlation for pH is positive = 0.48. As the pH level of wine increases the rating also increases. Volatile acidity has the largest negative correlation = -0.39. Quality ratings decrease as volatile acidity increases.

White Wine Heatmap

Looking across the quality rating row for white wine there are two features with the highest correlations. The correlation for pH is positive = 0.44. As the pH level of white wine increases the rating also increases. Density has the largest negative correlation = -0.31. Quality ratings decrease as density increases.

Machine Learning Classification Models:

Logistic
Support Vector Machines
Decision Tree
Random Forest
Gradient Boosted Tree
Logistic-Random Oversampling
Logistic-Synthetic Minority Over-sampling Technique
Logistic-Cluster centroid Undersampling
Balanced Random Forest Classifier
Easy Ensemble Ada Boost Classifier

Target: recoded quality score (0-6 = Not Good (0) and 7+= Good (1))

The dashboard is also available on Tableau Public: Wine Story