VC-21-C005-建筑暖通空调系统能耗优化的深度强化学习-国家数字标准馆

现行 VC-21-C005

到馆提醒

收藏跟踪

购买正版

Deep Reinforcement Learning for Energy Cost Optimization in Building HVAC Systems 建筑暖通空调系统能耗优化的深度强化学习

大型商业建筑的节能问题受到了工业界和学术界的广泛关注。特别是，供暖、通风和空调（HVAC）系统的控制和运行策略对于提高能源效率和保持建筑物的热舒适性至关重要。随着深度学习方法的快速发展，深度强化学习（DRL）正在成为一种数据驱动的控制策略，用于构建HVAC控制系统，它不需要系统的动态模型，而是直接从数据中搜索最优控制策略。
虽然之前的一些研究采用了各种DRL方法来控制建筑HVAC，但大多数研究只关注瞬时或短期的能耗，而没有考虑时变的电价曲线。此外，为DRL方案获得足够数据以提供可接受性能的机会成本尚未得到充分解决。因此，本文旨在开发DRL方法，以最大限度地降低能源成本（而不仅仅是消耗），同时仅使用有限的数据量来维持热舒适性要求。
具体来说，我们研究了无模型DRL方法，即深度Q学习，该方法具有特定的奖励函数，并使用定制的训练方法，使代理能够在受时变温度约束和电价影响的模拟建筑环境中快速学习可行的策略。通过与实际建筑中常用的基线启发式方法进行比较，验证了无模型方案的有效性。模拟测试结果表明，在训练数据有限的情况下，该方法的性能优于基线方法。
我们发现，系统状态和动作空间的定义对性能有显著影响。引文:2021篇虚拟会议论文

Energy efficiency for large-scale commercial buildings has received considerable attention from both industry and academia. In particular, the control and operation strategies in heating, ventilation, and air-conditioning (HVAC) systems are critical to improve energy efficiency and maintain thermal comfort in buildings. With the recent and rapid development of deep learning methods, deep reinforcement learning (DRL) is emerging as a data-driven control strategy for building HVAC control that does not require a dynamic model of the system and instead searches for the optimal control policy directly from data. While some previous works have adopted various DRL methods for building HVAC controls, most focus only on instantaneous or short-term energy consumption without considering time-varying electricity price profiles. Additionally, the opportunity cost of obtaining enough data for DRL schemes to provide acceptable performance has not been sufficiently addressed. Thus, this paper aims at developing DRL methods to minimize energy cost (rather than just consumption) while maintaining the thermal comfort requirements using only a limited amount of data. Specifically, we study the model-free DRL method, Deep Q Learning, with a specified reward function, with customized training methods to enable the agent to quickly learn feasible policies for a simulated building environment, subject to time-varying temperature constraints and electricity prices. The model-free scheme is validated by comparing with a baseline heuristic method commonly used in real buildings. The simulated test performance demonstrates that the approach can outperform the baseline method, given a limited amount of training data. We find that the definition of the system’s state and action spaces has a significant effect on performance.

分类信息

发布单位或类别：未知国家-其他未分类

关联关系

研制信息

相似标准/计划/法规