Model Card: Gradient Boosting Classifier

Purpose: Binary classification of breast cancer tumors (malignant vs. benign) using the Wisconsin Diagnostic Breast Cancer dataset.

Dataset Characteristics

Characteristic Value
Total Features 30
Training Samples 398 (Class 0: 148, Class 1: 250)
Test Samples 86 (Class 0: 32, Class 1: 54)
Class Balance (Train) 62.8% positive class

Feature Importance Analysis

Top 10 most influential features in the model's decision-making:

Rank Feature Importance Score Relative Impact
1 worst radius 0.3663
2 worst concave points 0.3431
3 worst area 0.0950
4 worst perimeter 0.0936
5 concavity error 0.0150
6 worst texture 0.0134
7 mean concave points 0.0117
8 mean texture 0.0108
9 worst concavity 0.0103
10 texture error 0.0083
Interpretation: The model relies most heavily on worst radius, which contributes 36.6% to the overall feature importance. This suggests this measurement is particularly discriminative for distinguishing malignant from benign tumors.

Model Behavior Analysis

Confusion Matrix (Test Set)

Predicted: Benign Predicted: Malignant
Actual: Benign 27 5
Actual: Malignant 3 51
Error Analysis:
• False Positives (benign → malignant): 5 cases
• False Negatives (malignant → benign): 3 cases

Note: In medical contexts, false negatives are typically more concerning as they represent missed cancer diagnoses.

Model Architecture

Gradient Boosting Configuration:
• Ensemble Size: 200 trees
• Learning Rate: 0.05
• Tree Depth: 2 (shallow trees for regularization)
• Subsampling: 1.0 (uses all training data per iteration)

Intended Use Cases

Limitations & Caveats

Important Limitations:

Reproducibility

Random Seed: 42
Data Splitting: Stratified splits to preserve class distribution
Validation: Staged predictions used for validation curves (per-iteration metrics)

Environment details (sklearn version, numpy version, Python version) should be captured by your experiment tracking backend or requirements.txt.

References & Context

For detailed metrics and training curves: Refer to the experiment tracking dashboard. This model card focuses on interpretability, data characteristics, and deployment considerations not captured in raw metrics.