×

Migrate data with Azure data factory

Anoop Sharma  Print   39 min read  
22 Jul 2021
 
Intermediate
836 Views

Microsoft Azure Certification covers almost all aspects of Azure. Azure data factory is a fully managed, serverless data integration solution. Primarily it is used for ingesting, preparing, and converting all your data at scale. In simple terms, it is a cloud-based integration service allowing you to prepare data-driven workflows in the cloud for coordinating and automating data transfer and data transformation.

One important thing to note is Azure data factory does not store any data on its own. The data transfer takes place between supported data stores. Moreover, data processing takes place through compute services in an on-premise environment or some other regions. It is also allowed to supervise and handle workflows through both UI systems and programming. The azure certification path may highlight an overview of the Azure data factory. Azure Training In Hyderabad helps you unlock your potential and obtain a relevant job.

 Before knowing how to migrate data, let’s go through the features and components of the Azure data factory.

1393

Features:

  • Azure data factory lets you ingest all your on-premises as well as software as a service (SaaS) data with 90+ built-in connectors. You can coordinate and supervise at scale.

  • It supports rehosting SQL Server Integration Services (SSIS) in just a few clicks.

  • You can build ETL and ELT pipelines without codes with the included Git and CI/CD support.

  • This fully managed serverless cloud service is cost-effective. You need to pay as you go.

  • It allows using the autonomous ETL to explore operational efficiencies and facilitate citizen integrators.

Components of Azure data factory:

Overview of Azure Data Factory Components | Cathrine Wilhelmsen

When you intend to Learn Azure Step by step, it is better that you learn the components of the Azure data factory along with. To survive in the competition, an Azure Developer should know the basics of the Azure data factory.

Pipelines: 

A pipeline is recognized as a logical grouping of activities that carry out a specific unit of work. Just one pipeline can perform various actions like Query the SQL Database, ingesting data from the Storage Blob, and more.

Activities: 

In a pipeline, an activity depicts a unit of work. An activity can be an action like transforming JSON data within a Storage Blob in SQL Table records or copying a Storage Blob data into a Storage Table

Triggers: 

Triggers represent an approach to implement a pipeline run. They determine when a pipeline execution must begin. Presently, the Azure data factory supports 3 kinds of triggers:

  • Schedule Trigger

  • Tumbling window trigger

  • Event-based trigger

Datasets: 

They symbolize data structures in the data stores that direct to the data. The particular data is required by the activities to use as outputs or inputs.

Integration Runtime: 

The Integration Runtime (IR) works as the compute infrastructure utilized to offer various data integration features. Names of these features are Data Movement, Data Flow, SSIS package execution, and Activity dispatch. 3 types of Integration Runtimes are:

  • Azure

  • Self-hosted

  • Azure SSIS

When you prepare for Azure interview Questions answer, a general understanding of components of the Azure data factory helps you clear some questions.

Migrate data in real-time with Azure data factory:

Azure Data Factory : Your First Data Pipeline – SQLServerCentral

Let’s take an example in which a developer is required to design a system for migration of the CSV file produced from the CRM Application into the central repository. The central repository is the Azure SQL Database for analytics and automation. 

CSV file comprises the unstructured data of 1000+ customer records using a delimiter. Such records must be competently migrated to the Azure SQL Database. At this point of time, the Azure data factory comes into play. It enables the creation of a pipeline to copy the customer detail records directly from CSV to the CustomerDetails Table in the Azure SQL Database. Follow these steps to migrate data from CSV to the Azure SQL Database:

Step-1: Firstly, create an Azure Data Factory and then open the ‘Azure Data Factory Editor’.

Step-2: Open the ‘Editor page’ and then click the ‘+’ button for creating an Azure Data Factory pipeline.

Step-3: Give a name to the Pipeline i.e. ‘Migrate_Customer_Details’ as represented below:

Migrate data

Steps to setup the Source of the Activity:

Step-1: Firstly, expand the ‘Move & Transform’ node available in the left navigation. Now drag Copy Data activity in the designer.

Step-2: Give the name of the activity i.e. ‘Copy_data_from_Blob’.

Azure Data Factory

Step-3: In this step, you have to select the ‘Source’ tab and then click on ‘+New’. By doing this, a blade will be opened that lets you choose a data source. Now select ‘Azure Storage Blob’ as the data source.

Step-4: Choose ‘CSV File’ in the Format Type Blade, Click ‘Now provide the file path’, and then click ‘OK’ for saving the data source.

Steps to setup Destination of the Activity:

Step-1: Select the ‘Sink’ tab and then click ‘+New’. It opens up a blade to select the destination. Now select ‘Azure SQL Database’ as the destination.

Step-2: In this step, you have to click ‘+ New’ in the Linked Services like and then provide the Azure SQL Database connection info. Now click ‘OK’ to save the destination.

Step-3: Give a name to the Table name and click

Setup Destination of the Activity

Steps to map CSV Properties to Table Properties:

Step-1: Firstly, click the ‘Mapping’ tab and then press the ‘Import Schemas’ button. By doing this, it automatically recognizes the CSV file and then maps the CSV properties to Table column properties.

Step-2: In case any mapping is incorrect, it is possible to modify change them manually like shown below:

Map CSV Properties

After the mapping completes, click on ‘Debug’ to begin the Pipeline run. By doing this, it starts migration of the CSV data to Table. After the Pipeline Run completes successfully, you need to check the SQL Database table to make sure the records have moved properly.

Various Data migration activities with Azure Data Factory:

When you earn Azure fundamentals certification or Azure administrator certification or azure architect certification, you will have at least a general understanding of data migration activities with the Azure data factory.

With Azure data factory, the data migration takes place between 2 cloud data stores as well as between a cloud data store and an on-premise data store.

One of the famous data migration activities is Copy Activity. It copies data from a source data store into a sink data store. Moreover, Azure supports different data stores like source or sinks data stores such as Azure Cosmos DB (DocumentDB API), Azure Blob storage, Azure Data Lake Store, Cassandra, Oracle, etc. 

Azure Data Factory also supports various transformation activities namely MapReduce, Hive, Spark, etc. which can be appended to pipelines either distinctly or sequenced with other activities. 

In case you wish to migrate data to/from a data store that is not supported in Copy Activity, you must use a .Net custom activity in Data Factory. For that, you need to apply your logic for copying or moving data. 

You can refer to the ‘Azure documentation: Transform data in Azure Data Factory’ to get in-depth details on data stores that support Data Factory for data transformation activities. Moreover, you can refer to the ‘Azure documentation for Data movement activities’ to get in-depth details on data stores that support Data Factory for data movement activities.

Also, Read -

Share Article

Take our free skill tests to evaluate your skill!

In less than 5 minutes, with our skill test, you can identify your knowledge gaps and strengths.

Training Schedules
+91 9999123502
Accept cookies & close this