Successful AI is all down to data management

Dave Smith

Artificial intelligence (AI) is everywhere these days, whether in reality or just as a hyped-up label for some simple rules-based decisioning, and this has led to some interesting problems, says David Smith, head of GDPR Technology, SAS UK & Ireland.

The first of these is mistrust, as noted by the incoming president of the British Science Association, Professor Jim Al-Khalili: “There’s a real danger of a public backlash against AI, potentially similar to the one we had with GM [genetic modification] back in the early days of the millennium”. Al-Khalili highlights that for AI to reach its full potential more transparency and public engagement is required.

The second potential issue is that of control; if models are truly left to run without monitoring and control then there is a chance for poor decisions. An example of this could be the “Flash Crash” in 2010 when the US Stock market dropped about 9% for 36 minutes. Although the regulators blamed a single trader spoofing the market, algorithmic trading systems were at least in part to blame for the depth of the crash.

Harnessing AI for good

 That said, AI has huge potential for good, whether providing better cancer diagnoses through more efficient screening of tumour images or protecting endangered species by interpreting images of animal footprints in the wild. The challenge is to ensure that these benefits are realised, and this is where the FATE (Fairness, Accountability, Transparency and Explainable) framework comes in, which is designed to ensure that AI is appropriately used. I will focus on the transparency aspects, where data management has the greatest impact.

AI can only ever be as good as the data that feeds it, and to build and use an AI application requires a number of data specific phases:

  • Data quality cleansing to ensure that modelling is not performed on data which contains irrelevant or incorrect items
  • Transforming, joining and enhancing data before the modelling process begins
  • Deployment, which takes the model and applies it to the organisation’s data to drive decision making

Each of these will add value but also potentially alter the results of the AI process. For example, if the data quality process removes outliers it may have very different impacts. If the outlier removal is appropriate the result will be a model which reflects the majority of data very well. On the other hand, it might ignore a rare but critical circumstance and miss the opportunity to bring real benefit.

This was shown in Dame Jocelyn Bell Burnell’s discovery of Pulsars, a type of rotating neutron star. She was examining miles of printout data from a radio telescope and noticed a small signal in one in every 100,000 data points. Despite her supervisor telling her it was man-made interference, she persisted and proved their existence by successfully looking for similar signals elsewhere. Had the outliers been removed she would not have made the discovery.

The data journey

 Data quality should also be applied to prevent embarrassing decisions. If Bank of America had checked the validity of their Name data, they might not have sent a credit card offer to “Lisa Is A Slut McXxxxxx” (her name is redacted. Ed.) in 2014. They had acquired the data from Golden Key International Honour Society, which recognises academic achievement. An unknown individual had edited her name in the register of members.

The process then continues with transformations to prepare the data for modelling; source systems are typically highly normalised and have information stored in multiple tables, whereas data scientists like a single square table to analyse. They will often need to add derived variables to help their analysis. These are usually defined initially in an ad-hoc data preparation environment by the data scientist but will need to be moved to a more controlled environment for production purposes.

The impact of this data transformation stage can be huge. Firstly, it is important to understand which data sources are being used in the analysis. This may be in relation to regulatory concerns such as whether personal data is being used, or simply to ensure that the correct data source is being accessed. Secondly, it is important to understand whether the transformation has been appropriate and correctly implemented; errors in implementation can be just as damaging as poor-quality data.

The last data process that directly impacts on AI is deployment, ensuring that the correct data is fed into the model and using the results to make decisions which directly impact on the organisations’ performance. Models have a definite shelf life during which time they accurately predict the real world, so if it takes too long to deploy models into production they will not deliver their full value.

An organised deployment process is also a necessary component of meeting the requirements of GDPR Article 22. This article prevents the use of analytical profiling on personal data unless strict conditions are adhered to (for example complete consent). Controlled deployment allows for an overview of which data has been used in the AI process and which analytical models have been applied to the data at any one time. This is critical to determining whether the regulation has been compromised.

Overall, data management is fundamental to AI being able to reach its true potential. Being able to understand how data processing is achieved is a crucial part of upholding transparency, one of the main pillars of fair, trusted and effective AI.

 The author of this blog is David Smith, head of GDPR Technology, SAS UK & Ireland.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow


Garmin Enduro 2 ultraperformance smartwatch adds mapping, touchscreen and its longest-ever battery life

Posted on: August 11, 2022

Olathe, United States. 09 August, 2022 – Garmin International, Inc., a unit of Garmin Ltd., announced the Enduro-2 ultraperformance multisport GPS smartwatch. Purpose built for endurance athletes, the rugged yet lightweight Enduro 2 includes built-in TopoActive maps with multicontinent coverage, a superbright LED flashlight and Garmin’s exclusive SatIQ technology to help optimise battery performance. What’s more, enhanced

Read more

Newly upgraded FineShare FineCam! A smart virtual camera makes video production easier

Posted on: August 11, 2022

Los Angeles, United States. 08 August, 2022 – FineShare Co., Ltd. announced the latest version of FineShare FineCam – an AI-powered virtual camera software. To meet the expectations of customers, FineShare FineCam now is available for Mac and rolls out some brand new features including advanced adjustment, webcam overlay, branding templates, low light video booster, auto framing, etc.

Read more

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way you work, play, and live. But is it just another buzzword, or is it the as-promised technological holy grail? The truth is that Internet of

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT Platform 2021? authors Robin Duke-Woolley, the CEO and Bill Ingle, a senior analyst, at Beecham Research. Figure 1 shows these parts and, although not all

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for smart home automation will reach $40 billion by 2020.

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into a more switched-on place to live.

Read more

What is IoT?

Posted on: July 7, 2019

What is IoT Data as a new oil IoT connectivity What is IoT video So what’s IoT? The phrase ‘Internet of Things’ (IoT) is officially everywhere. It constantly shows up in my Google news feed, the weekend tech supplements are waxing lyrical about it and the volume of marketing emails I receive advertising ‘smart, connected

Read more
IoT Newsletter

Join the IoT Now online community for FREE, to receive: Exclusive offers for entry to all the IoT events that matter, round the world

Free access to a huge selection of the latest IoT analyst reports and industry whitepapers

The latest IoT news, as it breaks, to your inbox