Cloud data platform
Imec’s road to rolling out 30+ uses cases per year with a modern and future-proof data platform
Business context
Imec - an international leading research hub for nano- and digital technology and the home of more than 5500 people - decided in 2019 to move their analytical workloads from on-premises to Microsoft Azure.
Imec’s initial solution supported a single use case, covering the analytical reporting on the maintenance management of the machinery and equipment in their cleanroom: "which machines are up for maintenance?", "what is the mean time to repair (MTTR)?", "how often are machines out of use?", etc.
Given imec’s ambition to roll-out a broad range of data & analytics capabilities company-wide, the company was concerned about the scalability and sustainability of its SAS-based traditional data warehouse. Imec selected Dataminded to perform a study to architect a modern and future-proof data platform, and get ready to scale from a handful of data products to establishing structural insights and decisions through data across all business lines.
Scope & objectives
Imec requested Dataminded to identify points for improvements related to their way of working and come up with guiding principles, architecture design and best practices for building and running data products.
Dataminded mobilized an experienced architect and two data engineers for a period of three months. Our starting point was assessing imec’s notebook-based platform and way of working. Dataminded worked on increasing understanding about pro’s and con’s of the as-is situation, and indicated required capabilities of a state-of-the-art and robust data platform. These included more sophisticated scheduling and orchestration, container-based data processing, proper security and data access control, logging & monitoring, and more. Keeping costs under control at scale was also a very important driver for Imec.
As a guiding principle, Dataminded introduced data mesh principles: organize use cases per domain (HR, Marketing, Sales, R&D…), treat data as a product owned by such a domain, leverage a self-service platform for the development and federated governance of data products.
Imec was very well aware of the pains, but was looking for guidance in navigating potential solutions. We did this by making things as practical and tangible as possible via hands-on, technical demos. We showcased the benefits of centralized governance, development and unit testing templates in PySpark and DBT, infrastructure automation with Terraform, Continuous Integration and Continuous Deployment (CICD), observability, and more.
Key results
Based on the study outcomes, Imec decided to accelerate its efforts, and go ahead with installing the newly designed platform and proving the value by means of deploying its main use case. At that time, we also developed a roadmap of prioritized use cases to roll-out. Ever since, Imec and Dataminded partner to develop data use cases and grow the data platform in line with emerging business needs.
As a cornerstone of Imec’s data platform, Imec chose Conveyor, a Dataminded product to guide data scientists and engineers through all stages of the data lifecycle, from experimentation to industrialisation and operations. It took us one month to get the data platform operational.
Next to Conveyor, Imec's data platform is built on top of a few foundational Azure services such as Azure Data Lake Storage (ADLS), Azure Key Vault, and Azure App Service. The data lake is secured by Azure's Role-Based Access Control (RBAC) and integrates smoothly with the workload identity management capabilities of Conveyor. In addition to Conveyor's Spark runtime for streaming and batch processing, Azure Synapse is used for ad-hoc querying of the data lake, and exposing the data to PowerBI for reporting and visualization. Long-lived services and web applications are deployed via Azure App Service and made available via the Azure API Management platform.
Within a year after the installation of Conveyor and the development of the central data platform, more than 30 use cases across 10 different domains have seen the light of day, going from ideation to production. Starting a new use case is as easy as adding a single line to a configuration file; our infrastructure-as-code (IaC) tool then automatically creates the necessary infrastructure needed to support the use case, allowing developers to focus mainly on writing business logic. Deployments are automated and can be triggered via the version control system.
Moving forward, Dataminded and Imec continue their collaboration to build a self-service data retrieval tool for Imec's R&D department. In time, this tool will empower over 1000 engineers to efficiently retrieve and explore data from their experiments, freeing up their time to focus on Imec's core business.
Impact
Dataminded has successfully deployed more than 70 data products for Imec concerning their wafer production process, R&D activities, marketing activities, and more. Costs have been reduced significantly.
We have onboarded several new engineers to the platform and have enabled self-service capabilities for rapidly experimenting and deploying new projects.
The growing number of data products and synergies made available by combining insights from different domains presents Imec with opportunities for expanding their analytics efforts. This rapid proliferation of data products inevitably comes with its own set of challenges, mainly related to governance: who has access to data? Who controls access?
How are new data products made available for and discoverable by the rest of the organization? The next step in our collaboration is therefore focussed on the data governance aspect of the data platform, by e.g. exposing products via a data catalog, and automating data access governance as much as possible, while keeping control firmly in the hand of data owners.