Insurance Claims Modeling with Missing Data and Uniform Marginals

Business Context

An actuarial team is modeling the dependency between 'Claim Cost' and 'Time to Report'. The raw data is messy, containing random missing values. Additionally, the team wants to test the workflow where data is pre-transformed to uniform distributions outside the action, requiring the 'UNIFORM' marginals setting. This tests the robustness of the action against data quality issues and manual marginal specification.

Data Preparation

Creation of a dataset with explicit missing values and values pre-scaled to [0,1] range to simulate pre-processed uniform marginals.

Copied!

1
2	DATA mycas.claims_messy;
3	call streaminit(42);
4	DO i = 1 to 200;
5	claim_cost = rand('Uniform');
6	time_to_report = rand('Uniform');
7	IF mod(i, 20) = 0 THEN call missing(claim_cost);
8	OUTPUT;
9	END;
10
11	RUN;
12

Étapes de réalisation

Attempt to fit a Gumbel copula specifying that input marginals are already 'UNIFORM', and request pseudo-samples output to verify handling of valid vs missing rows.

Copied!

1
2	PROC CAS;
3	copula.copulaFit / TABLE={name='claims_messy'}, var={'claim_cost', 'time_to_report'}, copulatype='GUMBEL', marginals='UNIFORM', outpseudo={name='claims_pseudo', replace=true};
4
5	RUN;
6
7	QUIT;
8

Check the output table info to verify the number of observations used (should exclude missing values).

Copied!

1
2	PROC CAS;
3	TABLE.tableInfo / TABLE='claims_pseudo';
4
5	RUN;
6
7	QUIT;
8

Expected Result

The action should run without crashing. It must automatically filter out the rows with missing 'claim_cost' values. The 'claims_pseudo' table is created containing only the complete cases, and the fit statistics reflect the reduced sample size.

Voir la documentation technique de copulaFit