×

Azure Data Factory Interview Questions

Girdhar Gopal Singh  Print   13 min read  
31 Aug 2021
 
Intermediate
1.28K Views

Based on Cloud, Azure Data Factory is a Microsoft tool that gathers raw business data and subsequently converts it into functional information. Essentially, it is a data integration ETL (extract, transform, and load) service responsible for automating the revolution of the provided raw data.

When you intend to learn Azure, you also need to know about the Azure data factory. In the following section on Interview questions on Azure Data Factory, you can know more on this and clear any confusion.

Let’s look at some of the Azure interview Questions answer that helps you to prepare for Azure job interviews. 

1. Briefly explain different components of Azure Data Factory:

Pipeline: It represents activities logical container. 

Dataset: It is a pointer to the data utilized in the pipeline activities 

Mapping Data Flow: Represents a data transformation UI logic 

Activity: In the Data Factory pipeline, Activity is the execution step that can be utilized for data consumption and transformation. 

Trigger: Mentions the time of pipeline execution.

Linked Service: It represents an explanatory connection string for those data sources being used in the pipeline activities 

Control flow: Regulates the execution flow of the pipeline activities 

2. What is the need for Azure Data Factory?

While going through azure tutorial, you would come across this terminology. Since data comes from different sources, it can be in any form. Such varied sources will transfer or channelize the particular data in various ways and the same can be in a varied format. Whenever we convey this data on the cloud or specific storage, it is inevitable to ascertain that this data is efficiently managed. So, you have to transform the data and remove unnecessary parts. 

Sine data transfer is concerned, it is important to ascertain that data is collected from various sources and convey at a common place. Now store it and if needed, transformation needs to be done. The same can be accomplished by a conventional data warehouse too but it comes with some limitations. Occasionally, we are impelled to use custom applications that can manage all such processes distinctly. But this process consumes time and integration of all such processes is troublesome. So, it is necessary to find out an approach to automate this process or design appropriate workflows. Azure Data Factory assists you to coordinate this entire process more conveniently.

3. Is there any limit on how many integration runtimes can be performed?

No, there is no limit on the number of integration runtime occurrences you can have in an Azure data factory. However, there is a limit on the number of VM cores that the integration runtime can utilize for every subscription for SSIS package implementation. When you pursue Microsoft Azure Certification, you should be aware of these terms.

4. Explain Data Factory Integration Runtime? 

Integration Runtime is a safe computing infrastructure being used by Data Factory for offering data integration abilities over various network environments. Moreover, it ascertains that such activities will get implemented in the nearest possible area to the data store. If you want to Learn Azure Step by step, you must be aware of this and other such fundamental Azure terminologies.

5. What it means by blob storage in Azure?

Blob storage in Azure is one of the key aspects to learn if you want to get Azure fundamentals certification. Azure Blob Storage is a service very useful for the storage of massive amounts of unstructured object data like binary data or text. Moreover, you can use Blob Storage to render data to the world or for saving application data confidentially. Typical usages of Blob Storage include:

  1. Directly serving images or documents to a browser
  2. Storage of files for distributed access
  3. Streaming audio and video
  4. Storing data for backup and reinstate disaster recovery, and archiving
  5. Storing data for investigation by an on-premises or any Azure-hosted service

6. Mentions the steps for creating ETL process in Azure Data Factory?

When attempting to retrieve some data from Azure SQL server database, if anything needs to be processed, it would be processed and saved in the Data Lake Store. Here are the steps for creating ETL:

  • Firstly, create a Linked Service for source data store i.e. SQL Server Database
  • Suppose that we are  using a cars dataset
  • Now create a Linked Service for a destination data store that is Azure Data Lake Store
  • After that, create a dataset for Data Saving
  • Setup the pipeline and add copy activity
  • Finally, schedule the pipeline by inserting a trigger

7. Mention about three types of triggers that Azure Data Factory supports?

  1. The Schedule trigger is useful for the execution of the ADF pipeline on a wall-clock timetable. 
  2. The Tumbling window trigger is useful for the execution of the ADF pipeline over a cyclic interval. It holds on to the pipeline state. 
  3. The Event-based trigger responds to an event that is related to blob. Examples of such events include adding or deleting a blob from your Azure storage account. 

8. How to create Azure Functions?

Azure Functions are solutions for implementing small lines of functions or code in the cloud. With these functions, we can choose preferred programming languages. You need to pay only for the time the code runs which means that you need to pay per usage. It supports a wide range of programming languages including F#, C#, Node.js, Java, Python, or PHP. Also, it supports continuous deployment as well as integration. It is possible to develop serverless applications through Azure Functions applications. When you enroll for Azure Training In Hyderabad, you can thoroughly know how to create Azure Functions.

9. What are the steps to access data through the use of the other 80 dataset types in Data Factory?

Currently, the Mapping Data Flow functionality allows Azure SQL Data Warehouse, Azure SQL Database, defined text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for sink and source.

You need to use the Copy activity to point data from any of the supplementary connectors. Subsequently, you need to run a Data Flow activity to efficiently transform data after it is already staged. 

10. What do you need for executing an SSIS package in Data Factory? 

You have to create an SSIS IR and an SSISDB catalog which is hosted in Azure SQL Managed Instance or Azure SQL Database. 

11. What are Datasets in ADF?

The dataset is the data that you would use in your pipeline activities in form of inputs and outputs. Generally, datasets signify the structure of data inside linked data stores like documents, files, folders, etc. For instance, an Azure blob dataset describes the folder and container in blob storage from which a specific pipeline activity must read data as input for processing.

12. What is the use of the ADF Service? 

ADF is primarily used to organize the data copying between various relational and non-relational data sources that are being hosted locally in your datacenters or in the cloud. Moreover, ADF Service can be used for the transformation of the ingested data to fulfill your business requirements. In most Big Data solutions, ADF Service is used as an ETL or ELT tool for data ingestion. When you enroll for Azure Training In Hyderabad, you can thoroughly know the usefulness of ADF Service.

13. How do the Mapping data flow and Wrangling data flow transformation activities differ in Data Factory? 

Mapping data flow activity is a data transformation activity that is visually designed. It enables you to effectively design graphical data transformation logic in absence of an expert developer. Moreover, it is operated as an activity inside the ADF pipeline on a fully-managed ADF scaled-out Spark cluster. 
On the other hand, wrangling data flow activity denotes a data preparation activity that does not use code. It integrates with Power Query Online for making the Power Query M functions accessible for data wrangling through spark implementation. 

14. What Are Azure Databricks?

Azure Databricks represent an easy, quick, and mutual Apache Spark based analytics platform that is optimized for Azure. It is being designed in partnership with the founders of Apache Spark. Moreover, Azure Databricks blends the finest of Databricks and Azure to let customers speed up innovation through a quick setup. The smooth workflows and an engaging workspace facilitate teamwork between data engineers, data scientists, and business analysts.

15. What is Azure SQL Data Warehouse?

It is a huge storage of data collected from a broad range of sources in a company and useful to make management decisions. These warehouses enable you to accumulate data from diverse databases existing as either remote or distributed systems. 
An Azure SQL Data Warehouse can be created by integrating data from multiple sources which can be utilized for decision making, analytical reporting, etc. In other words, it is a cloud-based enterprise application allowing you to function under parallel processing to rapidly examine a complex query from the massive data volume. Also, it works as a solution for Big-Data concepts.

16. What is Azure Data Lake?

Azure Data Lake streamlines processing tasks and data storage for analysts, developers, and data scientists. It is an advanced mechanism that supports the mentioned tasks across multiple platforms and languages. 
It removes the barriers linked with data storage. Also, it makes it simpler to carry out steam, batch, and interactive analytics. Features in Azure Data Lake resolve the challenges linked with productivity and scalability and fulfill growing business requirements. 

17. Explain data source in the azure data factory:

The data source is the source or destination system that comprises the data intended to be utilized or executed upon. Type of data can be binary, text, csv files, json files, etc. It can be  image files, video, audio, or might be a proper database. 

Examples of data source include azure data lake storage, Azure blob storage, or any other database such as mysql db, Azure sql databsse, postgres, etc. 

18. Why is it beneficial to use the Auto Resolve Integration Runtime ?

AutoResolveIntegrationRuntime automatically tries to execute the activities in the same region or in close proximity to the region of the particular sink data source. The same can boost performance.

19. How is lookup activity useful in the azure data factory?

In the ADF pipeline, the Lookup activity is commonly used for configuration lookup purposes. The source dataset is available in it. Moreover, it is used to retrieve the data from the source dataset and then send it as the output of the activity. Generally, the output of the lookup activity is further used in the pipeline for taking some decisions or presenting any configuration as a result.
In simple terms, lookup activity is used for data fetching in ADF pipeline. The way you would use it entirely relies on your pipeline logic. It is possible to obtain only the first row or you can retrieve the complete rows depending on your dataset or query.

20. What are the types of variables in the azure data factory?

Variables in the ADF pipeline allow temporary holding of the values. Their usage is similar just to the variables used in the programming language. For assigning and manipulating the variable values, two types of activities are used i.e. Set Variable and append variable. 

Two types of variables in Azure data factory are:

i. System variable: These are the fixed variable from the Azure pipeline. Their examples include pipeline id, pipeline name, trigger name, etc.

ii. User variable: User variables are manually declared depending on the logic of the pipeline.

21. Explain the linked service in the azure data factory?

In Azure Data Factory, linked service represents the connection system used to connect the external source. It functions as the connection string for holding the user validation information.
Two ways to create the linked service are:
  1. ARM template way
  2. Using the Azure Portal

22. What does it mean by the breakpoint in the ADF pipeline?

Breakpoint signifies the debug portion of the pipeline. If you wish to check the pipeline with any specific activity, you can accomplish it through the breakpoints.
To understand better, for example, you are using 3 activities in the pipeline and now you want to debug up to the second activity only. This can be done by placing the breakpoint at the second activity. To add a breakpoint, you can click the circle present at the top of the activity.

Share Article

Take our free skill tests to evaluate your skill!

In less than 5 minutes, with our skill test, you can identify your knowledge gaps and strengths.

Training Schedules
+91 9999123502
Accept cookies & close this