Azure Data Factory (ADF) is a service that is available in the Microsoft Azure ecosystem. This service allows the orchestration of different data loads and transfers in Azure. Azure data factories are composed of the following components:
Linked services: Connectors to the various storage and compute services. For example, we can have a pipeline that will use the following artifacts:
HDInsight cluster on demand: Access to the HDInsight compute service to run a Hive script that uses HDFS external storage Azure Blob storage/SQL Azure: As the Hive job runs, this will retrieve the data from Azure and copy it to an SQL Azure database Datasets: There are layers for the data used in pipelines. A dataset uses a linked service.
Learn how to develop a Data Discovery foundation for the future to drive process improvements.
Pipeline: The pipeline is the link between all datasets. It contains activities that initiate data movements and transformations. It is the engine of the factory;
without pipelines, nothing will move in the factory
Data Lake Store (static)
Data Lake Analytics component (paid on demand)
Microsoft came up with the Azure Data Lake, which is, in a nutshell, a cloud offering for big data that integrates with other Azure services such as: SQL database, SQL Server, SQL data warehouse, machine learning, Power BI, and Cortana. It also allows us to import and export data from almost any data source. Its main goals are ease of use and cost-effectiveness. The service has two main components: