Building a Solid Data Foundation: The Key to Successful AI and Machine Learning Initiatives.

The biggest customer of your data is your business. This article parallels the evolution of the UK's road network with today's data infrastructure challenges, underscoring the necessity of a solid data base for successful AI implementation. It offers a comprehensive guide on enhancing data quality, integration, and governance, crucial for navigating the complexities of AI and ML. With practical advice on identifying and rectifying data environment shortcomings, and insights into the emerging regulatory framework, this piece is a valuable roadmap for any organization looking to leverage AI and ML technologies effectively and ethically.

9 minute read

In case you missed it 2023 was the year artificial intelligence and machine learning went mainstream. ‘ChatGPT‘ was released on 30th November 2022 and many organisations began 2023 with a rallying call to discover new products, techniques and services that might capitalise on the wave of innovation and change that follows.

The resulting explosion of ideas has created new businesses, products, roles, responsibilities and opportunities, from the world’s first Prompt Engineer through to Chief Artificial Intelligence officer businesses are eager to grow talent and implement at scale, quickly; keen to make sure the right people are in place.

There’s a famous Mark Twain quote “History doesn’t repeat itself, but it often rhymes” so it’s perhaps no surprise that there are similarities in the evolution of technologies throughout history and Artificial Intelligence and Machine Learning looks to be following Mr Twain’s motto. One example that springs to mind is the evolution of the UK road network. In 43AD the Romans were responsible for laying the foundations of our road network. Roads have evolved since then but the motor vehicle and its rapid uptake during the 1950s triggered a huge national investment in motorway construction which in turn triggered more innovation and empowered its evolution. Improvements to roads forced vehicles to be better. In the first year of the M1 being opened 13,500 calls were made to the AA with frequent reports of overheating engines and melted pistons; manufacturers improved, and the cycle continued. It wasn’t only the roads and vehicles that needed innovation, increasing speeds led to wider safety issues which in turn led to better governance and the introduction of regulation which in turn led to more innovation. Everything from safety regulation and the resulting improvements in vehicles, through to Vehicle licensing, registration, national speed limits, compulsory driving tests, Cat’s Eyes, and pedestrian crossings all came from having a solid road!

Artificial intelligence and machine learning have their version of a road, the foundation on which their innovation will develop. Without a good underlying foundation of data infrastructure AI and Machine learning can feel like driving a vehicle on unpredictable, untested roads, the outcome might be unexpected and might mean not arriving anywhere at all!

As the use cases for AI/ML evolve it is incumbent on organisations to ensure that their data environments are fit for purpose, secure, and that they are confident in understanding the risks. Regulation is fast coming to form a part of this, in the same way that roads are regulated for safety, AI and ML regulation will help ensure that we’re configured to get the best from technology and that systems are in place to monitor and identify drift or bias and that data is used ethically, and with them they will spawn another explosion of innovation.

Is my organisation ready?

There are a few things that can be used to help show if the data environments in your organisation are configured to capitalise on the wave of new data-led technology. Perfect is not a requirement! I’ve yet to find an organisation that has it all, but some indicators might be giving you smells that there’s some work to do. Finding them is usually just a case of going looking. Recognising these indicators is the first step to transforming your data environment. Each of these likely requires a targeted approach that may involve technological and cultural changes.

Check the business strategy.

Data is the life blood of a business, or at least your business is one of the biggest customers of your data. Broadly there are 3 areas or use cases.

Providing analytics and performance analysis that informs business decisions.
Delivery of customer value through the organisation’s products and services
Research and development of new products.

Each of these areas should be recognised as an individual customer and might have its own strategy for understanding where the business expects functions and services to grow, ask the question what data needs to flow to support this business when its achieved all the numbers in its 5 year plan?

Spending some time understanding the targets will give you a good place to start thinking about some of the non-functional requirements, what speed do those reports need to be? What compliance regime will we be operating under? How big will the storage requirements be? What are our desired RTO and RPO times?. Armed with the answers to these questions (and others like them) you can start to talk to your teams about the measures of a system that meets these objectives and how it might be architected to achieve them.

Ask your team.

If you’re lucky enough to have a team already in place, then you should start there. It sounds simple but your team probably already know where the ship is creaking, they likely will have ideas on how to improve and might be interested in being part of architecting the solution...... start with a retro or a workshop and find out what might be frustrating them. Talk to your customer too, what's their strategy for data, how do they feel about your products and services ........ remember

“The best architectures, requirements, and designs
emerge from self-organizing teams.” - Agile Manifesto

Outside your team you can look for where performance is slow, failure rates are high, or where your customers are least happy. Try to focus on the data over gut feel but sometimes it proves valuable follow a hunch.

I’ve listed a few signals that might sound familiar to some below:

1. Poor Data Quality

The old phrase ‘quality in, quality out’ resonates for me. Data quality is at the heart of good algorithms and there are quite a few ways to reduce the quality!

Inaccuracies: This includes incorrect, outdated, or mis entered information. For instance, customer records with wrong addresses or sales data with incorrect figures.

Inconsistencies: Variations in data format or structure across different systems, like varying date formats or product naming conventions can lead to significant analysis challenges.

Incompleteness: Missing values or incomplete records can skew AI/ML model training, leading to inaccurate predictions or classifications.

2. Lack of Data Integration

Data silos: When data is isolated in department-specific systems, it prevents a comprehensive analysis. For example, if customer data is sorted separately from sales data, getting insights into purchasing behaviours becomes difficult.

Integration complexity: The presence of legacy systems that don’t easily integrate with newer technologies can be a major roadblock in creating a unified data ecosystem.

3. Inadequate data Governance

Undefined data ownership: Unclear responsibility for data accuracy, maintenance, and security can lead to neglected data quality.

Non-compliance risk: Without proper governance data might not follow regulations like GDPR, DPA or HIPAA in the US, leading to legal and financial repercussions.

Lack of standardisation: Absence of standard definitions and formats for data across the organisation can create confusion and errors in data handling.

4. Scalability Issues

Volume handling: Difficulty in managing large volumes of data, which is crucial for training robust AI/ML models.

Performance bottlenecks: Systems that are not designed to scale efficiently may experience slowdowns or crashes as data volumes grows.

Future growth limitations: An inability to accommodate future data sources and types can severely limit the potential of AI/ML

5. Limited data access

Technical barriers: Complex or outdated systems that make data retrieval difficult for users, limiting the ability to leverage data for AI/ML.

Access control issues: Overtly restrictive or poorly managed data access controls can impede the availability of data to those who need it for analysis.

Lack of data democratisation: When data is not made accessible across the organisation, it hinders the ability to foster a data-drive culture, crucial for AI/ML success.

6. Insufficient data infrastructure

Outdated technology: Legacy systems that are not compatible with modern AI/ML tools can hamper effective data processing and analysis.

Lack of Real-time processing: Inability to process data in real-time can be a significant drawback, especially for AI/ML applications that rely on timely data, like fraud detection systems.

7. Limited analytical capabilities:

Lack of advanced analytic tools: Without proper tools to analyse, visualise data organisations can miss critical insight that could inform AI/ML projects.

Skills gaps: A workforce that lacks data literacy or analytical skills can be a major hinderance to effectively use of AI/ML technologies.

It is far too easy to be critical and most organisations likely have at least some of these, that’s ok, achieving perfection in a rapidly evolving industry (if it ever could be achieved) is only a moment in time! Once found there are a few things that you can do to help the organisation start to improve.

Conduct a data audit: Start by assessing the current state of your data. Find data quality issues, integration gaps and any governance or compliance shortcomings.

Implement a Data Governance framework: Establish clear data governance policies to ensure quality, privacy and compliance. The frameworks should also define ownership, access controls and data standards.

Invest in Data integration tools: Utilise modern data tools to break down silos and create a unified view of your data landscape. This integration is crucial for feeding diverse data into AI/ML models.

Focus on data quality improvements: Cleanse and standardise your data. Implement process to continuously monitor and maintain high data quality.

Ensure Scalability and Flexibility: Upgrade your data infrastructure to handle increased data loads and support various data types and sources. Cloud-based solutions often offer the required scalability and flexibility.

Enhance Data Accessibility: Make data easy to access for authorised personnel. This involves not just technical solutions, but also training and culture changes to ensure data-drive decision making.

Build a skilled team: Ensure you have the right team in place. This team should not only understand data management but also the nuances of AI and ML. A good consultancy can help you get started with this.

Start Small and scale: Begin with small. Manageable AI/ML projects to test and learn from your improved data platform. Gradually scale up as your confidence grows.

Conclusion

Recognising if investing in AI/ML is going to bring value to your business is essential. The technology will present opportunities from improving employee productivity through to cultivating exciting new world-changing products. The speed, ease, and ability to capitalise on those opportunities will be dictated by the quality of your ‘roads’ which need to support flowing, high-quality data. In the realm of AI/ML now more than ever your insights are only as good as the data you feed them.