While Amazon Athena and Azure Data Factory are the first port of call for someone trying to do data engineering on their preferred cloud, the likes of dbt, Talend, FiveTran and Matillion lead the league table when it comes to UI based ETL processing for the data teams that have been on the journey for a while and are in the need of agility and making it easier for newer members of the team to manage ETL via UI without going through the complexity CLI.
Key Opensource products: Apache Nifi, Talend Open Studio, Pentaho Kettle, CloverETL, Apache Beam and Flink
Azure Data Lake Storage Gen2, Amazon Redshift Spectrum and GCP Big Lake allow you to query external data and store them to be used with the delta format bringing ease of lakehouse to query structured, semi structured and unstructured data.
The inventors of Spark and Delta, the Databricks offers Delta format as their default storage and anything stored on Databricks can be queried using Spark, Python, Java, SCALA and SQL.
Key ISVs: AWS, Microsoft, GCP and Databricks
During ETL data teams would have to process large amounts of data and this data can be coming in the form of streams or batches and there are several players on the market that are helping enterprises with the data processing.
Amazon Redshift, Snowflake help with traditional batch data warehousing batch loads where as Amazon Kinesis, Confluent help with streaming loads and are leading the league tables.
Databricks stands out from the crowd as it can handle both workloads.
Key Opensource products: Kafka, Spark
Key ISVs: AWS Redshift, Snowflake, Confluent, AWS Kinesis and Databricks
There are plethora of tools in this area and every platform offers their own visualisation capabilities however the leaders in this category are PowerBI, Tableau, Looker and Qlik. They have had the biggest market share over the years and have loyal customer base and talent pool availability.
Key Opensource products: Matplotlib, Metabase, Redash
Key ISVs: MS Power BI, Tableau, Looker, Qlik, Plotly, ThoughSpot
Alongside the legacy players such as Informatica, Oracle, SAP and IBM there are new players in the market. Talend is a provider of open-source data integration and data management software. They offer a range of solutions that help organizations to manage their data effectively, including data integration, data quality, and data governance. Collibra is a provider of data governance software that helps organizations to manage their data more effectively. Their solutions include data governance, data lineage, and data cataloging. Alation is a provider of data cataloging and data governance software. Their solutions help organizations to manage their data more effectively by providing a centralized catalog of data assets and enabling data governance workflows.
Key Opensource products: Talend
Key ISVs: Informatica, Oracle, SAP, IBM, Talend, Collibra, Alation and Immuta
This is in no way a comprehensive list of products available on the market and has been collated for knowledge sharing. It is indicative only and we do not endorse any of these products and/or are not paid by any of these companies.