Page 11 - Penn State Civil and Environmental Engineering Magazine
P. 11

The astronomical number of potential states and actions, called the “curse of dimensionality,” is a barrier for infrastructural decision-making, explained Charalampos Andriotis, a postdoctoral researcher in civil engineering at Penn State.
“If this immense number of states and actions underlies simple infrastructure settings, how can we tame the complexities of real-world networks?” Andriotis said.
According to Papakonstantinou and Andriotis, recent advances in computing brought forth a novel solution. By combining deep learning—a set of algorithms able to discover patterns
in vast amounts of data—with existing reinforcement learning techniques, a powerful framework for sequential decision- making problems is produced. Deep reinforcement learning has shown the ability to surpass even human intelligence, outwitting experts in games that have been played for hundreds of years.
Inspired by these advances, the researchers developed their own deep reinforcement learning model, a deep centralized multi-agent actor-critic algorithm, capable of identifying optimal maintenance and inspection policies for large systems and overcoming the “curse of dimensionality.”
“The more complex the problem, the more intelligent the solution needs to be,” Andriotis said. “Emerging challenges like climate change, population demands, and resource limitations further epitomize such complexities for the resilience and sustainability of future communities.”
Papakonstantinou and Andriotis tested their multi- agent algorithm against a variety of baseline policies and deterioration factors, all representative of standard infrastructure management environments.
In one scenario, the algorithm was applied to a deteriorating, non-stationary network with 10 components and a 50-year
life span. Each component could exhibit one of four possible damage levels, ranging from “no damage” to “failure, and
their deterioration rate was controlled by actions taken by the “agent.” After a few thousand training episodes, it “learned” its optimal policy, outperforming all of the baselines by up to 50%.
“It performed better with partial state information than any traditional policy tested, even when these were optimized based on exact knowledge of the system, hence, the best information possible,” Papakonstantinou said. “We didn’t expect this outcome.”
In another scenario, the algorithm was applied to a truss bridge structural system with multiple components subject to corrosion and a 70-year lifespan. To navigate this system with 1.2 times 10 to the power of 81 possible states, it took the “agent” 18,000 training episodes to surpass the best baseline. It eventually exceeded all other policies by up to 20%, a figure that appears greater in relation to the $3 billion spent in Pennsylvania on highway and bridge improvements last year.
According to the researchers, algorithm improvements continue. They are currently working on the ability to handle resource scarcity and other constraints necessary for practical applications while also studying the autonomous cooperation of multiple “agents” under these operating conditions.
“It’s a matter of computational power, but maybe in 10 years, entire infrastructure networks at the state and national levels could be managed with the aid of AI,” Papakonstantinou said.
This research was supported by a National Science Foundation Faculty Early Career Development (CAREER) award and the U.S. Department of Transportation 2018 Region 3 Center for Integrated Asset Management for Multimodal Transportation Infrastructure Systems (CIAMTIS).
       This is a “2D embedding of the belief space” of a truss bridge structure subject to corrosion based on the last “350-D hidden layer of the critic network.” Each point represents a different belief the system may reach throughout its operational life of 70 years. The decision-maker obtains a complete, optimized plan for all structural members—and for any possible future system state distribution—along with the estimated total life-cycle cost from that step until the end of the planning horizon.
One policy realization of a deteriorating system with 10 components. Detailed decentralized policies are learned for each system component while optimizing for a centralized, overarching objective.

   9   10   11   12   13