Four different occupancy variables are compared with use of the results of a detailed occupancy survey in data driven multi-variable regression analysis of building cooling data. The results of regression models using the different occupancy variables are compared with synthetic data from two simulations of a large building, one with VAV and one with CAV systems. The results suggest that a simple linear transformation treating occupancy level as being linearly proportional to the difference between lighting and equipment consumption and the minimum value of this consumption is comparable to using a detailed, more demanding, occupancy survey. Representing the occupancy variable by a value of 1 during weekday business hours, 0 outside these hours; and 0.33 during weekends for the same business hours, and 0 outside gave somewhat poorer results. Use of an occupancy value derived from the lighting and equipment loads by dividing all values by the absolute maximum value of the lighting and equipment consumption, or use of a value of 1 for weekday occupancy and 0 for weekends both gave results that were much poorer than the occupancy survey or the other two approaches tested.