Modern Data Products

Ingestion and ETL Tools

While Amazon Athena and Azure Data Factory are the first port of call for someone trying to do data engineering on their preferred cloud, the likes of dbt, Talend, FiveTran and Matillion lead the league table when it comes to UI based ETL processing for the data teams that have been on the journey for a while and are in the need of agility and making it easier for newer members of the team to manage ETL via UI without going through the complexity CLI. 




Key Opensource products: Apache Nifi, Talend Open Studio, Pentaho Kettle, CloverETL, Apache Beam and Flink

Key ISVs: dbt, Talend, FiveTran, Matillion

This is a set of commercial software used by modern data engineers and architects for Data Ingestion and ETL

Modern ETL Tools

Cloud Storage

Azure Data Lake Storage Gen2, Amazon Redshift Spectrum and GCP Big Lake allow you to query external data and store them to be used with the delta format bringing ease of lakehouse to query structured, semi structured and unstructured data.

The inventors of Spark and Delta, the Databricks offers Delta format as their default storage and anything stored on Databricks can be queried using Spark, Python, Java, SCALA and SQL.




Key ISVs: AWS, Microsoft, GCP and Databricks

This is a set of commercial software used by modern data engineers and architects for data storage on the cloud

Data storage on the cloud

Data Processing

During ETL data teams would have to process large amounts of data and this data can be coming in the form of streams or batches and there are several players on the market that are helping enterprises with the data processing.

Amazon Redshift, Snowflake help with traditional batch data warehousing batch loads where as Amazon Kinesis, Confluent help with streaming loads and are leading the league tables.

Databricks stands out from the crowd as it can handle both workloads.

Key Opensource products: Kafka, Spark

Key ISVs: AWS Redshift, Snowflake, Confluent, AWS Kinesis and Databricks

This is a set of commercial software used by modern data engineers and architects for Data Processing

Data Processing on the cloud

Analytics and Visualisation

There are plethora of tools in this area and every platform offers their own visualisation capabilities however the leaders in this category are PowerBI, Tableau, Looker and Qlik. They have had the biggest market share over the years and have loyal customer base and talent pool availability.



Key Opensource products: Matplotlib, Metabase, Redash

Key ISVs: MS Power BI, Tableau, Looker, Qlik, Plotly, ThoughSpot

This is a set of commercial software used by modern data engineers and architects for Data Analytics and Visualisation

Data Analytics and Visualisation on the cloud

Data Quality and Data Governance

Alongside the legacy players such as Informatica, Oracle, SAP and IBM there are new players in the market. Talend is a provider of open-source data integration and data management software. They offer a range of solutions that help organizations to manage their data effectively, including data integration, data quality, and data governance. Collibra is a provider of data governance software that helps organizations to manage their data more effectively. Their solutions include data governance, data lineage, and data cataloging.  Alation is a provider of data cataloging and data governance software. Their solutions help organizations to manage their data more effectively by providing a centralized catalog of data assets and enabling data governance workflows.

Key Opensource products: Talend

Key ISVs: Informatica, Oracle, SAP, IBM, Talend, Collibra, Alation and Immuta

This is a set of commercial software used by modern data engineers and architects for Data Quality and Data Governance

Data Quality and Data Governance on the cloud

This is in no way a comprehensive list of products available on the market and has been collated for knowledge sharing. It is indicative only and we do not endorse any of these products and/or are not paid by any of these companies.