Portable by design: Rethinking data platforms in the age of digital sovereignty
Portable by design: Rethinking data platforms in the age of digital sovereignty
Portable by design: Rethinking data platforms in the age of digital sovereignty

Portable by design: Rethinking data platforms in the age of digital sovereignty

Build a portable, EU-compliant data platform and avoid vendor lock-in—discover our cloud-neutral stack in this deep-dive blog.

Cloud Independence: Testing a European Cloud Provider Against the Giants
Cloud Independence: Testing a European Cloud Provider Against the Giants
Cloud Independence: Testing a European Cloud Provider Against the Giants

Cloud Independence: Testing a European Cloud Provider Against the Giants

Can a European cloud provider like Ionos replace AWS or Azure? We test it—and find surprising advantages in cost, control, and independence.

Stop loading bad quality data
Stop loading bad quality data
Stop loading bad quality data

Stop loading bad quality data

Ingesting all data without quality checks leads to recurring issues. Prioritize data quality upfront to prevent downstream problems.

A 5-step approach to improve data platform experience
A 5-step approach to improve data platform experience
A 5-step approach to improve data platform experience

A 5-step approach to improve data platform experience

Boost data platform UX with a 5-step process:gather feedback, map user journeys, reduce friction, and continuously improve through iteration

The Data Engineer’s guide to optimizing Kubernetes
The Data Engineer’s guide to optimizing Kubernetes
The Data Engineer’s guide to optimizing Kubernetes

The Data Engineer’s guide to optimizing Kubernetes

Boost Kubernetes batch workload efficiency with smarter scheduling, autoscaling tweaks & spot instance handling.

Are your AKS logging costs too high? Here’s how to reduce them
Are your AKS logging costs too high? Here’s how to reduce them
Are your AKS logging costs too high? Here’s how to reduce them

Are your AKS logging costs too high? Here’s how to reduce them

Cut Azure logging costs: reduce log volume, use Basic tables via the new ingestion API, and try a custom Fluentbit plugin with Go.

Data Modelling In A Data Product World
Data Modelling In A Data Product World
Data Modelling In A Data Product World

Data Modelling In A Data Product World

Central DWHs hit scaling limits. Data products offer a modular, federated solution—flexible, reusable, and closer to business reality.

SAP CDC with Azure Data Factory

Build SAP CDC in Azure Data Factory with SAP views, but high IR costs. Kafka + Confluent offers a cheaper, scalable alternative.

From Good AI to Good Data Engineering.AI to
From Good AI to Good Data Engineering.AI to
From Good AI to Good Data Engineering.AI to

From Good AI to Good Data Engineering. Or how Responsible AI interplays with High Data Quality

Responsible AI depends on high-quality data engineering to ensure ethical, fair, and transparent AI systems.

A glimpse into the life of a data leader
A glimpse into the life of a data leader
A glimpse into the life of a data leader

A glimpse into the life of a data leader

Data leaders face pressure to balance AI hype with data landscape organization. Here’s how they stay focused, pragmatic, and strategic.

Beyond Medallion: How to Structure Data for Self-Service Data Teams
Beyond Medallion: How to Structure Data for Self-Service Data Teams
Beyond Medallion: How to Structure Data for Self-Service Data Teams

Beyond Medallion: How to Structure Data for Self-Service Data Teams

Medallion architecture limits self-service. Shift to data product thinking with input, private, and output data for agile, governed scaling.

How To Conquer The Complexity Of The Modern Data Stack
How To Conquer The Complexity Of The Modern Data Stack
How To Conquer The Complexity Of The Modern Data Stack

How To Conquer The Complexity Of The Modern Data Stack

The more people on a team, the more communication lines. Same goes for tools in your data stack, complexity scales fast

The Data Product Portal Integrates With Your Preferred Data Platform
The Data Product Portal Integrates With Your Preferred Data Platform
The Data Product Portal Integrates With Your Preferred Data Platform

The Data Product Portal Integrates With Your Preferred Data Platform

Data Product Portal integrates with AWS to manage data products, access, and tooling—enabling scalable, self-service data platforms.

How To Reduce Pressure On Your Data Teams
How To Reduce Pressure On Your Data Teams
How To Reduce Pressure On Your Data Teams

How To Reduce Pressure On Your Data Teams

Data demand grows, pressuring small teams. Shift to focused data product teams and use portals to stay efficient and avoid data siloes.

Data Product Portal Integrations 2: Helm
Data Product Portal Integrations 2: Helm
Data Product Portal Integrations 2: Helm

Data Product Portal Integrations 2: Helm

Data Product Portal links governance, access & tools for self-service data on AWS. Supports Terraform & API integration for automation.

Data Stability with Python
Data Stability with Python
Data Stability with Python

Data Stability with Python: How to Catch Even the Smallest Changes

Detect data changes efficiently by sorting and hashing DataFrames with Python—avoid re-running pipelines and reduce infrastructure costs.

Why You Should Build A User Interface To Your Data Platform
Why You Should Build A User Interface To Your Data Platform
Why You Should Build A User Interface To Your Data Platform

Why You Should Build A User Interface To Your Data Platform

Don’t give users a bag of tools—build a UI for your data platform to reduce complexity, boost adoption, and enable true self-service.

Data Product Portal Integrations 1: OIDC
Data Product Portal Integrations 1: OIDC
Data Product Portal Integrations 1: OIDC

Data Product Portal Integrations 1: OIDC

Integrate OIDC with the Data Product Portal for secure, user-specific access via SSO. Easy setup with AWS Cognito, Docker, or Helm.

The State of Data Products in 2024
The State of Data Products in 2024
The State of Data Products in 2024

The State of Data Products in 2024

Data Products are rising fast in 2024, focusing on user experience, collaboration, and governance—set to reach maturity within 2–3 years.

Clear signals: Enhancing communication within a data team
Clear signals: Enhancing communication within a data team
Clear signals: Enhancing communication within a data team

Clear signals: Enhancing communication within a data team

Clear team communication boosts data project success. Focus on root problems, structured discussions, and effective feedback to align better

Demystifying Device Flow
Demystifying Device Flow
Demystifying Device Flow

Demystifying Device Flow

Implement OAuth 2.0 Device Flow with AWS Cognito & FastAPI to enable secure logins for headless devices like CLIs and smart TVs.

Introducing Data Product Portal
Introducing Data Product Portal
Introducing Data Product Portal

Introducing Data Product Portal: An open source tool for scaling your data products

The Data Product Portal is an open-source tool to build, manage & govern data products at scaleenabling clear access, lineage & self-service

Short feedback cycles on AWS Lambda
Short feedback cycles on AWS Lambda
Short feedback cycles on AWS Lambda

Short feedback cycles on AWS Lambda

Speed up AWS Lambda dev with a Makefile: build, deploy, test, and stream logs in one loop boost feedback cycles to just ~15 seconds.

The Missing Piece to Data Democratization is More Actionable Than a Catalog
The Missing Piece to Data Democratization is More Actionable Than a Catalog
The Missing Piece to Data Democratization is More Actionable Than a Catalog

The Missing Piece to Data Democratization is More Actionable Than a Catalog

The Data Product Portal is the missing link for scaling data democratization, beyond catalogs, it unifies access, governance & tooling.

Prompt Engineering for a Better SQL Code Generation With LLMs Copy
Prompt Engineering for a Better SQL Code Generation With LLMs Copy
Prompt Engineering for a Better SQL Code Generation With LLMs Copy

Prompt Engineering for a Better SQL Code Generation With LLMs Copy

Boost SQL generation with LLMs using prompt engineering, schema context, user feedback & RAG for accurate, business-aware queries.

Age of DataFrames 2: Polars Edition
Age of DataFrames 2: Polars Edition
Age of DataFrames 2: Polars Edition

Age of DataFrames 2: Polars Edition

In this publication, I showcase some Polars tricks and features.

Quack, Quack, Ka-Ching: Cut Costs by Querying Snowflake from Duck
Quack, Quack, Ka-Ching: Cut Costs by Querying Snowflake from Duck
Quack, Quack, Ka-Ching: Cut Costs by Querying Snowflake from Duck

Quack, Quack, Ka-Ching: Cut Costs by Querying Snowflake from DuckDB

How to leverage Snowflake’s support for interoperable open lakehouse technology — Iceberg — to save money.

The building blocks of successful Data Teams
The building blocks of successful Data Teams
The building blocks of successful Data Teams

The building blocks of successful Data Teams

5 key traits of successful data teams: ownership, business focus, software best practices, self-service, and company-wide strategy.

Querying Hierarchical Data with Postgres
Querying Hierarchical Data with Postgres
Querying Hierarchical Data with Postgres

Querying Hierarchical Data with Postgres

Query hierarchical data in Postgres using recursive CTEs. Navigate up/down trees, track depth, and aggregate—great for parent-child data.

Securely use Snowflake from VS Code in the browser
Securely use Snowflake from VS Code in the browser
Securely use Snowflake from VS Code in the browser

Securely use Snowflake from VS Code in the browser

Secure Snowflake SSO in browser-based VS Code using custom OAuth, CLI/API auth flow, and a dbt adapter for seamless cloud IDE integration.

The benefits of a data platform team
The benefits of a data platform team
The benefits of a data platform team

The benefits of a data platform team

Build a dedicated data-platform team to manage ingest,storage & tools, freeing business data teams to focus on creating value from insights.

How to organize a data team to get the most value out of data
How to organize a data team to get the most value out of data
How to organize a data team to get the most value out of data

How to organize a data team to get the most value out of data

Data teams succeed by shifting from tech-only focus to value delivery—combine product thinking, business goals & cross-functional roles.

Why not to build your own data platform
Why not to build your own data platform
Why not to build your own data platform

Why not to build your own data platform

A round-table discussion summary on imec’s approach to their data platform

Becoming Clout* certified
Becoming Clout* certified
Becoming Clout* certified

Becoming Clout* certified

Hot takes about my experience with cloud certifications

You can use a supercomputer to send an email but should you?
You can use a supercomputer to send an email but should you?
You can use a supercomputer to send an email but should you?

You can use a supercomputer to send an email but should you?

Discover the next evolution in data processing with DuckDB and Polars

Two Lifecycle Policies Every S3 Bucket Should Have
Two Lifecycle Policies Every S3 Bucket Should Have
Two Lifecycle Policies Every S3 Bucket Should Have

Two Lifecycle Policies Every S3 Bucket Should Have

Abandoned multipart uploads and expired delete markers: what are they, and why you must care about them thanks to bad AWS defaults.

How we used GenAI to make sense of the government
How we used GenAI to make sense of the government
How we used GenAI to make sense of the government

How we used GenAI to make sense of the government

We built a RAG chatbot with AWS Bedrock and GPT4 to answer questions about the Flemish government.

My key takeaways after building a data engineering platform
My key takeaways after building a data engineering platform
My key takeaways after building a data engineering platform

My key takeaways after building a data engineering platform

Building a data platform taught me: deleting code is vital, poor design has long-term costs, and dependency updates are never-ending.

Leveraging Pydantic as a validation layer.
Leveraging Pydantic as a validation layer.
Leveraging Pydantic as a validation layer.

Leveraging Pydantic as a validation layer.

Ensuring clean and reliable input is crucial for building robust services.

7 Lessons Learned migrating dbt code from Snowflake to Trino
7 Lessons Learned migrating dbt code from Snowflake to Trino
7 Lessons Learned migrating dbt code from Snowflake to Trino

7 Lessons Learned migrating dbt code from Snowflake to Trino

Snowflake to Trino dbt migration: watch out for type casting, SQL functions, NULL order, and window function quirks.

Everyone to the data dance floor: a story of trust
Everyone to the data dance floor: a story of trust
Everyone to the data dance floor: a story of trust

Everyone to the data dance floor: a story of trust

Data democratization is coming, but trust and governance are key. Start with pipeline observability: track runs, versions, and authors.

Quacking Queries in the Azure Cloud with DuckDB
Quacking Queries in the Azure Cloud with DuckDB
Quacking Queries in the Azure Cloud with DuckDB

Quacking Queries in the Azure Cloud with DuckDB

DuckDB on Azure: fsspec works for now, but native Azure extension is faster—especially with many small files. Full support is on the way.

How we reduced our docker build times by 40%
How we reduced our docker build times by 40%
How we reduced our docker build times by 40%

How we reduced our docker build times by 40%

This post describes two ways to speed up building your Docker images: caching build info remotely, using the link option when copying files

Cross-DAG Dependencies in Apache Airflow: A Comprehensive Guide
Cross-DAG Dependencies in Apache Airflow: A Comprehensive Guide
Cross-DAG Dependencies in Apache Airflow: A Comprehensive Guide

Cross-DAG Dependencies in Apache Airflow: A Comprehensive Guide

Exploring four methods to effectively manage and scale your data workflow dependencies with Apache Airflow.

Upserting Data using Spark and Iceberg
Upserting Data using Spark and Iceberg
Upserting Data using Spark and Iceberg

Upserting Data using Spark and Iceberg

Use Spark and Iceberg’s MERGE INTO syntax to efficiently store daily, incremental snapshots of a mutable source table.

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen


Vat. BE.0667.976.246

Germany

Spaces Tower One,
Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

Vat. BE.0667.976.246

Germany

Spaces Tower One, Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.


Belgium

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

Vat. BE.0667.976.246

Germany

Spaces Tower One, Brüsseler Strasse 1-3, Frankfurt 60327, Germany

© 2025 Dataminded. All rights reserved.