Challenge 4: Role of data

Artificial Intelligence (AI) techniques and tools are benefiting today from the enormous amount of personal and environmental data that is registered daily by IT systems. The quality and interoperability of this data are a determining factor for the possibility of applying new technologies.

Among the main AI techniques that can be used to process such data, for example, is that of so-called supervised learning. In this case, the data must be “annotated” by humans who teach the machines how to interpret it. This operation is very onerous since it requires a conspicuous and complex amount of human work. In addition to the long time necessary to perform this annotation work, the discretion of the annotators could generate uneven datasets (i.e.: similar data annotated in a different way), weakening the operation of machines and propagating errors and biases [1].

The challenge associated with the role of data is therefore the creation of conditions, including organisational conditions, which allow Artificial Intelligence to use correctly created databases, where consistency, quality and intelligibility are guaranteed.

In the Internet of Things field, one of the main challenges to be addressed is that the data collected by interconnected devices and sensors is different from that with which the scientific community of data-scientists has had to deal with in the past. In fact, the greatest successes that have been achieved in the AI field regard applications such as image processing, autonomous driving and web search that have been made possible thanks to the availability of large and relatively structured datasets, able to be used therefore in training machine learning algorithms.

On the contrary, data coming from a multitude of connected devices can be fragmented, heterogeneous and distributed irregularly in space and time: a challenge of rare complexity for anyone who wants to analyse data in a structured manner.

A second area of discussion is the management and research of data published on the web in the form of linked open data [2].

This data, which may regard both the institutional task of a public body (e.g. land registry or administrative data) as well as its operation (e.g. internal data) is made accessible and usable through open formats. While representing a mine of information, the data needs adequate tools to be exploited to its full potential. In particular, information retrieval [3] and filtering models and methods are needed based on semantic technologies and shared ontologies.

This work, already envisaged by the Digital Administration Code (DAC) and launched within the scope of the activities of the Digital Team, will be part of the broader perspective of conceptual governance of public information assets.

Regarding the huge data assets of the Public Administration, the challenge that AI technologies allow to face is that of transforming such data into widespread and shared knowledge, such as to make the Public Administration transparent to citizens and above all to itself, guaranteeing to citizens and administrators not only semantic access to information and interoperability of processes, but a better understanding of the relationship between state and citizen.

Once the conditions for the proper functioning of the Artificial Intelligence methodologies have been created, one of the tasks of Public Administration will be to aggregate the data necessary to support process improvement. This could be achieved through the creation of an open platform for the collection, generation and management of certain types of data, directly related to Public Administration [4]. The decentralised use of public datasets, essential for the development of active participation practices (civic activism), in turn requires specific capabilities of governance of the socio-technical system of Public Administration. It is in fact essential that data quality is ensured at source, through the generalised adoption of guidelines and appropriate content standards.

To achieve these ambitious objectives, there are many issues to be addressed, including some that have been appearing in the e-government plans of developed countries for many years. These include:

  • truthfulness and completeness of data;
  • data distribution and access methods;
  • design and definition of shared ontologies;
  • supervision of public dataset quality;
  • estimate of the economic value attributable to the data;
  • tools that allow citizens to monitor data production;
  • management and promotion of data access [5];
  • regulation of data usage [6].

The last three items of the list just presented introduce a further issue for PA: making sure that anyone who wants to develop Artificial Intelligence solutions useful for citizens can have equal and non-discriminatory access to the necessary data.


[1]Ref. the “Ethical Challenge”.
[3]Information Retrieval: the set of techniques used for the targeted recovery of information in electronic format.
[5]For example, “grand challenges” can be called. Those organised by NIST on Speech Recognition and Machine Translation, by DARPA on Autonomous Vehicles, or by ImageNet on Vision are famous.