Swimming in data: will you sink, tread water or swim?
04 April 2019
Data-driven decision making is fast
becoming the status quo for the development of infrastructure in our
cities.
How we scan, categorise and assess
data will determine whether we sink,
tread water frantically or swim in the waves of information overload.
“While
technology has opened the flood gates to data, so to speak, there
will need to be a filter through which we process the information
pouring in,” explains Rafid Morshedi, Data Analytics and Automation
Engineer. “For example, when planning new
rail lines, one of the most difficult tasks we face is to stay
informed about
Development Applications (DAs) in the surrounding
environment as new residential or commercial developments can impact a
proposed alignment.
“In
New South Wales, this information is in the public domain but
scattered across various local and state government websites, and the
only way to access it is through manual searches. This is time
consuming, costly and open to human error.
“As a result, there are three options to consider:
-
Ignore the DAs and risk reworking the alignment later (sink)
-
Undertake regular searches which relies on people to be proactive (causing us to tread water to keep up)
-
Develop technology that removes human error by automating the process of data analysis and highlighting information that is relevant and important to our project (swim).
“This third option is one of the unique approaches we used on a large
NSW
rail project recently,” explains Mr Morshedi. “It began with
having the right people involved who were familiar with both the
planning process and the data that drove it.
“As with any research, you need to understand the research domain, and
what you are looking for. This is where the human element is important –
asking the right questions and thus defining the right data elements to
extract. In this case, the question was simple; is our alignment going
to be impacted by a third-party development?
“The next step involved gathering the data. To extract the relevant
information and locate the data needed, several publicly available
spatial datasets were integrated such as the Digital Cadastral Database
and the Geocoded-National Address File. These datasets were critical to
the whole process. Open data released by government agencies were
invaluable in developing the system. Security and ethics were considered
throughout the development cycle.
“The process of collecting DA data from public databases was automated
and the resulting flood of data was automatically risk-rated using a
machine learning algorithm trained on past high-risk DAs. It is
important to note that the human element is still a vital part of the
process – we need to be the ones to do the final check to ensure that
the right information is being picked up.
“We tested several machine learning techniques to find an appropriate
algorithm and set of hyperparameters that met our recall requirements.
We needed to ensure a high true positive rate. A boosted tree algorithm
was used to assign a preliminary risk rating to DAs.
“This is a new approach to
infrastructure planning allows for rapid
appraisals of new design options. It saves time, money and resources
while at the same time optimising design outcomes.”
With the availability of data increasing, it is critical to convert it
into information which can be used to inform decision making. The flood
of data in transport infrastructure is here – are you sinking, treading
water or swimming?
--ENDS--
Source: WSP - www.wsp-pb.com
Contact: N/A
External Links: N/A
Recent news by: WSP | PB