ever use KEEP and DROP in the same DATA step; it creates logical redundancy and can lead to confusion since DROP always takes precedence. Additionally, if you are reading from a massive dataset but only need a few columns, use the KEEP= option on the SET statement. This prevents unnecessary variables from ever entering the PDV, providing the maximum possible performance boost.
Type : CREATION_INTERNE
Examples use generated data (datalines) or SASHELP.
| 1 | DATA employees; |
| 2 | INPUT name $ address $ city $ state $ zip $ phone $; |
| 3 | DATALINES; |
| 4 | John Doe 123 Main St Anytown CA 90210 555-1234 |
| 5 | Jane Smith 456 Oak Ave Othercity NY 10001 555-5678 |
| 6 | ; |
| 7 | RUN; |
| 8 | |
| 9 | DATA employees_subset; |
| 10 | SET employees; |
| 11 | keep name address city state zip phone; |
| 12 | RUN; |
| 1 | DATA scores; |
| 2 | INPUT name $ score1-score20; |
| 3 | DATALINES; |
| 4 | Alice 85 90 78 92 88 76 95 89 80 82 77 91 85 93 86 79 90 84 87 94 |
| 5 | Bob 70 65 72 75 68 80 73 78 71 76 69 81 74 79 70 82 75 77 71 80 |
| 6 | ; |
| 7 | RUN; |
| 8 | |
| 9 | DATA average; |
| 10 | SET scores; |
| 11 | keep name avg; |
| 12 | avg=mean(of score1-score20); |
| 13 | RUN; |