session batchresults

Performance Test on a High-Volume Genomic Data Aggregation

Scénario de test & Cas d'usage

Business Context

A bioinformatics research institute is processing massive genomic datasets. A researcher initiates a heavy aggregation (`simple.summary`) on a table with hundreds of millions of rows. The process is expected to run for several hours. The goal is to ensure that `batchresults` can successfully detach such a long-running, resource-intensive job and that the job continues even if the original client session is terminated.
About the Set : session

Management of the CAS session state.

Discover all actions of session
Data Preparation

Create a very large table 'genomic_variants' with 100 million rows to simulate a resource-intensive, long-running task.

Copied!
1DATA casuser.genomic_variants;
2 call streaminit(456);
3 DO i = 1 to 100000000;
4 chromosome = 'chr' || put(ceil(rand('UNIFORM')*22), 2.);
5 position = ceil(rand('UNIFORM')*10000000);
6 quality_score = rand('NORMAL', 50, 5);
7 IF (mod(i, 1000000) = 0) THEN put i=; /* Log progress */
8 OUTPUT;
9 END;
10RUN;

Étapes de réalisation

1
Start a named session 'research_session' and initiate the creation of the large dataset. Then, start a 'simple.summary' action asynchronously.
Copied!
1cas research_session name='research_session';
2PROC CAS;
3 SESSION research_session;
4 /* Assume data_prep code has been run */
5 ACTION SIMPLE.summary RESULT=summary_job /
6 TABLE='genomic_variants',
7 async='genomic_summary_job';
8 PRINT 'Research Session UUID: ' || SESSION.sessionId();
9RUN;
2
From a 'supervisor_session', immediately detach the long-running job using its session UUID.
Copied!
1cas supervisor_session name='supervisor_session';
2PROC CAS;
3 SESSION supervisor_session;
4 /* Replace 'uuid-from-research-session' with the actual UUID */
5 ACTION SESSION.batchresults / uuid='uuid-from-research-session';
6RUN;
3
To test robustness, terminate the original 'research_session'. The server-side job should continue running.
Copied!
1PROC CAS;
2 SESSION research_session;
3 ACTION SESSION.endSession;
4RUN;
4
Much later, from a new session, check the job status and fetch the results once completed.
Copied!
1cas results_session;
2PROC CAS;
3 SESSION results_session;
4 /* Check status periodically */
5 ACTION SESSION.actionstatus / name='genomic_summary_job';
6 /* Fetch when complete */
7 ACTION SESSION.fetchresult / name='genomic_summary_job';
8RUN;

Expected Result


The `batchresults` action should execute quickly, detaching the 'genomic_summary_job'. The `endSession` action on 'research_session' should succeed without affecting the server-side job. After a significant amount of time, `actionstatus` will show the job as 'completed', and `fetchresult` will successfully retrieve the summary statistics for the 100 million row table, proving the job ran to completion independently of the client session.