Purpose: Binary classification of breast cancer tumors (malignant vs. benign)
using the Wisconsin Diagnostic Breast Cancer dataset.
Dataset Characteristics
Characteristic
Value
Total Features
30
Training Samples
398 (Class 0: 148, Class 1: 250)
Test Samples
86 (Class 0: 32, Class 1: 54)
Class Balance (Train)
62.8% positive class
Feature Importance Analysis
Top 10 most influential features in the model's decision-making:
Rank
Feature
Importance Score
Relative Impact
1
worst radius
0.3663
2
worst concave points
0.3431
3
worst area
0.0950
4
worst perimeter
0.0936
5
concavity error
0.0150
6
worst texture
0.0134
7
mean concave points
0.0117
8
mean texture
0.0108
9
worst concavity
0.0103
10
texture error
0.0083
Interpretation: The model relies most heavily on
worst radius, which contributes 36.6%
to the overall feature importance. This suggests this measurement is particularly
discriminative for distinguishing malignant from benign tumors.
Note: In medical contexts, false negatives are typically more concerning
as they represent missed cancer diagnoses.
Model Architecture
Gradient Boosting Configuration:
• Ensemble Size: 200 trees
• Learning Rate: 0.05
• Tree Depth: 2 (shallow trees for regularization)
• Subsampling: 1.0 (uses all training data per iteration)
Intended Use Cases
Primary: Educational demonstration of ML experiment tracking
Research: Baseline model for breast cancer classification benchmarks
Prototyping: Template for clinical ML workflows
Limitations & Caveats
Important Limitations:
No Calibration: Probability outputs may not be well-calibrated.
Consider isotonic or Platt scaling for clinical use.
Fixed Threshold: Uses default 0.5 decision threshold without
cost-benefit analysis for medical context.
No Cross-Validation: Single train-test split may not capture
full model variance.
Dataset Bias: Wisconsin dataset may not generalize to different
populations, imaging equipment, or clinical protocols.
Feature Scale Sensitivity: Model may be sensitive to feature
scaling differences in production data.
No Fairness Audit: No analysis of performance across demographic
subgroups (age, race, etc.).
Reproducibility
Random Seed: 42 Data Splitting: Stratified splits to preserve class distribution Validation: Staged predictions used for validation curves (per-iteration metrics)
Environment details (sklearn version, numpy version, Python version) should be
captured by your experiment tracking backend or requirements.txt.
References & Context
Dataset: Wisconsin Diagnostic Breast Cancer (WDBC) from UCI ML Repository
Features: Computed from digitized images of fine needle aspirate (FNA) of breast mass
Feature Engineering: Mean, standard error, and "worst" values for 10 real-valued characteristics
For detailed metrics and training curves:
Refer to the experiment tracking dashboard. This model card focuses on
interpretability, data characteristics, and deployment considerations not
captured in raw metrics.