Scénario de test & Cas d'usage
Automated Machine Learning (AutoML) and pipeline generation.
Discover all actions of dataSciencePilotCreation of a customer dataset with demographic information and a churn flag. 'income' and 'last_complaint_category' have missing values.
| 1 | DATA casuser.crm_churn_data; |
| 2 | LENGTH customer_id $ 10. last_complaint_category $ 20.; |
| 3 | INPUT customer_id $ age income tenure plan_type $ churn last_complaint_category $; |
| 4 | CARDS; |
| 5 | CUST001 34 55000 24 Premium 0 Technical |
| 6 | CUST002 45 . 60 Basic 1 Billing |
| 7 | CUST003 28 48000 12 Basic 0 . |
| 8 | CUST004 52 120000 120 Premium 0 Technical |
| 9 | CUST005 21 . 6 Basic 1 . |
| 10 | CUST006 65 85000 84 Premium 0 Billing |
| 11 | CUST007 33 62000 30 Basic 0 Technical |
| 12 | CUST008 41 . 48 Premium 1 . |
| 13 | ; |
| 14 | RUN; |
| 1 | PROC CASUTIL; |
| 2 | load DATA=casuser.crm_churn_data outcaslib='casuser' casout='crm_churn_data' replace; |
| 3 | RUN; |
| 4 | QUIT; |
| 1 | PROC CAS; |
| 2 | dataSciencePilot.analyzeMissingPatterns / |
| 3 | TABLE={name='crm_churn_data', caslib='casuser'}, |
| 4 | inputs={{name='age'}, {name='income'}, {name='tenure'}, {name='plan_type'}, {name='last_complaint_category'}}, |
| 5 | nominals={'plan_type', 'last_complaint_category'}, |
| 6 | target='churn', |
| 7 | casOut={name='churn_missing_analysis', caslib='casuser', replace=true}; |
| 8 | RUN; |
| 9 | QUIT; |
The output table 'churn_missing_analysis' should contain results like 'TargetCounts' and 'TargetMeans'. We expect to see a higher churn rate (mean of 'churn' closer to 1) for the pattern where both 'income' and 'last_complaint_category' are missing, suggesting that incomplete customer profiles are a risk factor for churn.