copula copulaFit

Insurance Claims Modeling with Missing Data and Uniform Marginals

Scénario de test & Cas d'usage

Business Context

An actuarial team is modeling the dependency between 'Claim Cost' and 'Time to Report'. The raw data is messy, containing random missing values. Additionally, the team wants to test the workflow where data is pre-transformed to uniform distributions outside the action, requiring the 'UNIFORM' marginals setting. This tests the robustness of the action against data quality issues and manual marginal specification.
Data Preparation

Creation of a dataset with explicit missing values and values pre-scaled to [0,1] range to simulate pre-processed uniform marginals.

Copied!
1 
2DATA mycas.claims_messy;
3call streaminit(42);
4DO i = 1 to 200;
5claim_cost = rand('Uniform');
6time_to_report = rand('Uniform');
7IF mod(i, 20) = 0 THEN call missing(claim_cost);
8OUTPUT;
9END;
10 
11RUN;
12 

Étapes de réalisation

1
Attempt to fit a Gumbel copula specifying that input marginals are already 'UNIFORM', and request pseudo-samples output to verify handling of valid vs missing rows.
Copied!
1 
2PROC CAS;
3copula.copulaFit / TABLE={name='claims_messy'}, var={'claim_cost', 'time_to_report'}, copulatype='GUMBEL', marginals='UNIFORM', outpseudo={name='claims_pseudo', replace=true};
4 
5RUN;
6 
7QUIT;
8 
2
Check the output table info to verify the number of observations used (should exclude missing values).
Copied!
1 
2PROC CAS;
3TABLE.tableInfo / TABLE='claims_pseudo';
4 
5RUN;
6 
7QUIT;
8 

Expected Result


The action should run without crashing. It must automatically filter out the rows with missing 'claim_cost' values. The 'claims_pseudo' table is created containing only the complete cases, and the fit statistics reflect the reduced sample size.