SAS9

MISSOVER vs. TRUNCOVER: The Final Showdown

Simon 12 vues

When importing raw text files (via DATA STEP and INFILE), it often happens that the lines do not all have the same length. By default, SAS© has a dangerous behavior: if it cannot read a variable completely, it jumps to the next line to continue (the FLOWOVER option).

To prevent this, two options exist: MISSOVER and TRUNCOVER. Many people think they are interchangeable. However, there is a critical difference that can skew your data.

1. The Crash Scenario

Imagine asking SAS© to read 5 characters starting from position 18. INPUT @articles_translated/article_18_en.json code $5.;

But, your data line is shorter than expected: it stops at position 20.

  • SAS© needs positions: 18, 19, 20, 21, 22.

  • The file contains: 18, 19, 20 (end of line).

So, 2 characters are missing to satisfy the request. This is where the choice of option changes everything.

2. MISSOVER: "All or Nothing"

The MISSOVER (Missing Over) option is strict. If SAS© does not find all of the characters required by the format, it considers the read impossible for that variable.

  • Behavior: "You promised me 5 characters, but I only see 3. I'm taking nothing."

  • Result: The variable is set to missing.

  • Risk: You lose the partial information present at the end of the line (characters 18, 19, and 20 are ignored).

3. TRUNCOVER: "Take What's There"

The TRUNCOVER (Truncated Over) option is flexible. If the line ends before the format is complete, SAS© retrieves whatever it finds up to the line break.

  • Behavior: "I wanted 5 characters, but you only have 3? No problem, give me the 3."

  • Result: The variable contains the partial value (the characters at positions 18-20).

  • Advantage: No data is lost.

4. Comparative Example

SAS© Code:

1DATA test;
2 INFILE DATALINES missover; /* Testez ensuite avec truncover */
3 INPUT @articles_translated/article_21_en.json nom $5.;
4 DATALINES;
5ABCDE
6ABC
7;
8RUN;
Raw Data (Line)Actual LengthMISSOVER OptionTRUNCOVER Option
ABCDE5"ABCDE""ABCDE"
ABC3(Blank)"ABC"
Expert's Analysis: TRUNCOVER handles "partial data". MISSOVER gives up if the length is insufficient. If the line were even shorter (e.g., stopping at position 16 when reading starts at 18), both options would return a missing value.

5. Why the Historical Confusion?

  • On old Mainframes, files often had fixed and rigid lengths. Incomplete lines were rare.

  • On Windows and Unix, files are variable-length (delimited by carriage returns). It is very common for one line to be shorter than the others (e.g., an empty address at the end of the line).

The modern recommendation is clear: Almost always use TRUNCOVER.

It is the safest option for reading variable-length files (complex CSVs, positional flat files) without risking the loss of data located at the very end of a line. MISSOVER should only be used if you have a specific reason to reject partial data.