Another way to eliminate wasted space and thereby to reduce the amount
of data storage space that you need is to reduce the length of numeric
variables.
In addition to conserving data storage space, reduced-length numeric variables
use less I/O, both when data is written and when it is
read. For a file that is read frequently, this savings can be significant.
However, in order to safely reduce the length of numeric variables, you
need to understand how SAS stores numeric data.
How SAS Stores Numeric VariablesTo store numbers of large magnitude and to perform computations that require many digits of precision to the right of the decimal point, SAS stores all numeric values using double-precision floating-point representation. SAS stores the value of a numeric variable as multiple digits per byte. A SAS numeric variable can be from 2 to 8 bytes or 3 to 8 bytes in length, depending on your operating environment. The default length for a numeric variable is 8 bytes. The figures below show how SAS stores a numeric value in 8 bytes. For mainframe environments, the first bit stores the sign, the next seven bits store the exponent of the value, and the remaining 56 bits store the mantissa. For non-mainframe environments, the first bit stores the sign, the next eleven bits store the exponent of the value, and the remaining 52 bits store the mantissa.
|
Assigning Lengths to Numeric VariablesYou can use a LENGTH statement to assign a length from 2 to 8 bytes to numeric variables. Remember, the minimum length of numeric variables depends on the operating environment. Also, keep in mind that the LENGTH statement affects the length of a numeric variable only in the output data set. Numeric variables always have a length of 8 bytes in the program data vector and during processing. |
General form, LENGTH statement for numeric variables:
LENGTH variable(s) length <DEFAULT=n>;where
|
DEFAULT= applies only to numeric variables that are added to the program data vector after the LENGTH statement is compiled. You would list specific variables in the LENGTH statement along with the DEFAULT= argument only if you wanted those variables to have a length other than the value for DEFAULT=. If you list individual variables in the LENGTH statement, you must list an integer length for each of them.
ExampleThe following program assigns a length of 4 to the new variableSale_Percent in the data set ReducedSales.
The LENGTH statement in this DATA step does not apply to the variables
that are read in from the Sales data set; those variables
will maintain whatever length they had in Sales when
they are read into ReducedSales.
data reducedsales; length default=4; set sales; Sale_Percent=15; run; |
Maintaining Precision in Reduced-Length Numeric VariablesThere is a limit to the values that you can precisely store in a reduced-length numeric variable. You have learned that reducing the number of bytes that are used for storing a numeric variable does not affect how the numbers are stored in the program data vector. Instead, specifying a value of less than 8 in the LENGTH statement causes the number to be truncated to the specified length when the value is written to the SAS data set. You should never use the LENGTH statement to reduce the length of your numeric variables if the values are not integers. Fractional numbers lose precision if truncated. Even if the values are integers, you should keep in mind that reducing the length of a numeric variable limits the integer values that can accurately be stored as a value. The following table lists the possible storage length for integer values on UNIX or Windows operating environments.
If you decide to reduce the length of your numeric variables, you might want to verify that you have not lost any precision in your values. Let's look at one way to do this. |
Using PROC COMPAREYou can use PROC COMPARE to gauge the precision of the values that are stored in a shortened numeric variable by comparing the original variable with the shortened variable. The COMPARE procedure compares the contents of two SAS data sets, selected variables in different data sets, or variables within the same data set. |
General form, PROC COMPARE step to compare two
data sets:
PROC COMPARE BASE=SAS-data-set-onewhere SAS-data-set-one and SAS-data-set-two specify the two SAS data sets that you want to compare. |
PROC COMPARE is a good technique to use for gauging the loss of precision in shortened numeric variables because it shows you whether there are differences in the stored numeric values even if these differences do not show up once the numeric variables have been formatted. PROC COMPARE looks at the two data sets and compares their
Output from the COMPARE procedure includes
ExampleThe data set Company.Discount contains data about sale dates and discounts for certain retail products. There are 35 observations in Company.Discount, which is described below.
If you were to print these two data sets (Company.Discount and Company.Discount_Short), the values might appear to be identical. However, there are differences in the values as they are stored that are not apparent in the formatted output. In the partial output below, you can see that shortening the length
of
|
Comparative Example: Creating a SAS Data Set That Contains Reduced-Length Numeric Variables |
Suppose you want to create a SAS data set in which to store retail data about a group of orders. Suppose that the data you want to include in your data set is all numeric data and that it is currently stored in a raw data file. You can create the data set using The following sample programs show each of these techniques. You can use these samples as models for creating benchmark programs in your own environment. Your results might vary depending on the structure of your data, your operating environment, and the resources that are available at your site. You can also view general recommendations for creating reduced-length numeric variables.
|
Programming Techniques |
|
|
|