Energy efficiency for large-scale commercial buildings has received considerable attention from both industry and academia. In particular, the control and operation strategies in heating, ventilation, and air-conditioning (HVAC) systems are critical to improve energy efficiency and maintain thermal comfort in buildings. With the recent and rapid development of deep learning methods, deep reinforcement learning (DRL) is emerging as a data-driven control strategy for building HVAC control that does not require a dynamic model of the system and instead searches for the optimal control policy directly from data. While some previous works have adopted various DRL methods for building HVAC controls, most focus only on instantaneous or short-term energy consumption without considering time-varying electricity price profiles. Additionally, the opportunity cost of obtaining enough data for DRL schemes to provide acceptable performance has not been sufficiently addressed. Thus, this paper aims at developing DRL methods to minimize energy cost (rather than just consumption) while maintaining the thermal comfort requirements using only a limited amount of data. Specifically, we study the model-free DRL method, Deep Q Learning, with a specified reward function, with customized training methods to enable the agent to quickly learn feasible policies for a simulated building environment, subject to time-varying temperature constraints and electricity prices. The model-free scheme is validated by comparing with a baseline heuristic method commonly used in real buildings. The simulated test performance demonstrates that the approach can outperform the baseline method, given a limited amount of training data. We find that the definition of the system’s state and action spaces has a significant effect on performance.