Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

Hasan, Md Mahmudul; Lwin, Khin; Imani, Maryam; Shabut, Antesar M.; Bittencourt, Luiz F.; Hossain, Mohammed Alamgir

Hasan_et_al_2019_3.pdf (2.97 MB)

Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

journal contribution

posted on 2023-08-30, 16:33 authored by Md Mahmudul Hasan, Khin Lwin, Maryam Imani, Antesar M. Shabut, Luiz F. Bittencourt, Mohammed Alamgir Hossain

Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in São Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario.

History

Refereed

Yes

Volume

86

Page range

107-135

Publication title

Engineering Applications of Artificial Intelligence

ISSN

0952-1976

External DOI

https://doi.org/10.1016/j.engappai.2019.08.014

Publisher

Elsevier

File version

Accepted version

Language

eng

Official URL

https://doi.org/10.1016/j.engappai.2019.08.014

Legacy posted date

2019-09-11

Legacy creation date

2019-09-11

Legacy Faculty/School/Department

Faculty of Science & Engineering

Usage metrics

Keywords

Dynamic environment reinforcement learning deep Q network water quality resilience meta-policy selection artificial intelligence

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

History

Refereed

Volume

Page range

Publication title

ISSN

External DOI

Publisher

File version

Language

Official URL

Legacy posted date

Legacy creation date

Legacy Faculty/School/Department

Usage metrics

Categories

Keywords

Licence

Exports