Artificial Intelligence and Data
Even though AI technologies have existed for several decades, it’s the explosion of data that has allowed it to advance at incredible speeds. It’s the billions of searches done every day on Google that provide a sizable real-time data set for Google to learn from our typos and search preferences. Siri and Cortana would have only a rudimentary understanding of our requests without the billions of hours of spoken word now digitally available that helped them learn our language.
Each year, the amount of data we produce doubles and it is predicted that within the next decade there will be 150 billion networked sensors (more than 20 times the people on Earth). This data is instrumental in helping AI devices learn how humans think and feel, and accelerates their learning curve and also allows for the automation of data analysis. The more information there is to process, the more data the system is given, the more it learns and ultimately the more accurate it becomes.
In the past, AI’s growth was stunted due to limited data sets, representative samples of data rather than real-time, real-life data and the inability to analyze massive amounts of data in seconds. Today, there’s real-time, always-available access to the data and tools that enable rapid analysis. This has propelled AI and machine learning and allowed the transition to a data-first approach.
Quality Data for AI systems
As someone who has worked nearly two decades with large data sets, the obvious challenge in every single case I have encountered was to effectively query heterogeneous data sources and then extract and transform data. The non-obvious challenge was the early identification of data issues, which in most cases were unknown to the data owners as well.
AI systems need to become aware of the quality in data: they must instantly identify potential issues and avoid exposing dirty, inaccurate or incomplete data. This implies that even if there is a sudden problematic situation resulting in poor-data quality entries, the AI will be able to handle the quality issue and proactively notify the right users; depending on how critical the issues are, it might also deny serving data or serve data while flagging the potential issues.
AI systems of future should be designed assuming that at some point there will be problematic data feeds and unexpected quality issues.