Here are Zhicong and Tianxin (Jessica).

Email: zhicong@umich.edu & ltianxin@umich.edu

Baseline Model for High-Rating Recipe Prediction

Model Type

The baseline model is a DecisionTreeClassifier from scikit-learn.
A decision tree offers a fast, easily interpretable starting point for classification.

Features Used

  • 4 Quantitative: minutes, sugar, sodium, calories
  • 1 Nominal: is_healthy

Feature Processing

  • Quantitative features were used as-is (no scaling needed for trees).
  • Nominal feature is_healthy was already Boolean (0/1), so no encoding step was required.

    Note: Decision trees split on raw values and do not require one-hot or label encoding. Other models (e.g., Logistic Regression, KNN) do require encoding of categorical inputs.

  • All preprocessing and model fitting were wrapped in a single sklearn.Pipeline.

Performance

Metric (test set) Value
F1 score 0.848

The F1 score balances precision and recall and indicates strong baseline performance given the small feature set.

Interpretation

From the value of the F1-score, this baseline model can be considered a reasonably good starting point, as it reliably identifies high-rated recipes. However, there is still substantial room for improvement—for instance, we have not yet conducted extensive data preprocessing or optimized the model’s hyperparameters. Thus, the result primarily suggests that the features we selected have predictive value for the rating class.