When I started my journey with Azure Big Data stack, my first impression of Azure Data Factory was that it is a cloud equivalent of SQL Server Integration Services, and I was not the only one to think so. Back then, I would be wrong.
With ADF V2 the situation has completely changed, because now we can host and run SSIS packages in the cloud! Additionally, Azure Data Factory introduced another object called Triggers which lets us manage and schedule pipeline executions.
Welcome back to my last post about Azure Data Factory V2. At the end of this series, I would like to cover the final two topics which are:
- Integration Runtime (IR) which lets you host and run SSIS packages, among other functionalities
- Triggers – what they are, their types, how to create and schedule them.
Of course, we will use ADF Visual Tools for all these activities. Let’s go!
Integration Runtime, your compute infrastructure
In ADF V2, Integration Runtime (IR) is the computing infrastructure that delivers three data integration functionalities:
- Data movement – the most commonly performed activity is copying data between sources and destinations. Integration Runtime takes on the whole burden related to this task
- Activity dispatch – when you run activities which are delegated to another server or service, i.e. run a stored procedure on Azure SQL Data Warehouse, IR is responsible for coordination and monitoring of such activities
- SSIS package execution – the last capability is the native execution of SQL Server Integration Services packages in Azure environment.
To better explain what IR is, I quote from the official ADF V2 documentation:
In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the bridge between the activity and Linked Services. It is referenced by the linked service and provides the compute environment where the activity either runs on or gets dispatched from. This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.
In addition to the aforementioned core capabilities, it is also important to mention Integration Runtime types which are:
Within this article, I will focus on the SSIS package execution capability and Azure-SSIS IR type. For more details about Integration Runtimes’ types and how to decide which one should be used in a particular situation, I encourage you to read the official documentation.
Azure-SSIS Integration Runtime
Let’s begin with going over the most important facts about SSIS IR:
- Azure-SSIS IR is a dedicated ADF V2 compute infrastructure to host and run SSIS packages
- You can provision it both in a public or a private network
- In case you need to perform a workload on on-premises sources, you have to set up a Virtual Network (VNet) which joins your cloud infrastructure with on-premises
- Under the hood, the SSIS IR has dedicated Azure Virtual Machines
- You can scale your SSIS environment by simply selecting the node size and the number of nodes within the cluster
- Azure SSIS-IR requires Azure SQL Database for Azure SQL Manage Instance Server to host SSISDB
- Packages are developed using SQL Server Data Tools (SSDT), and SSIS IR can be managed by SQL Server Management Studio (SSMS).
Now that we’ve covered the theory, it is time to do some practice. Let’s create our Azure-SSIS Integration Runtime!
Azure-SSIS IR setup
Since my last post, Visual Tools have changed. ADF V2 is still in the preview stage. Therefore, you may notice that some stuff looks different or new elements appeared. OK, let’s move forward!
- Open Visual Tools.
- Switch to Connections.
- On the Connections tab switch to Integration Runtimes. Notice that by default each ADF instance has a defaultIntegrationRuntime assignment.
- In order to create SSIS IR, click the New button.
- The Integration Runtime Setup tab will appear. Select Lift-and-shift existing SSIS packages to execute in Azure and click the Next button.
- In the next window, specify Name, Location (e.g. West Europe), Node Size (for demo purpose I suggest to select D1 v2 VM, to save on cost) and Node Number (set to 1), and then click the Next button.
Selecting the appropriate Node Location is the crucial thing. More details about IR location setup you can find in the official documentation.
- In the next step, you will need to set up SQL. As I mentioned, Azure hosts the SSISDB. You therefore need to create an Azure SQL Server instance. Select Subscription, Catalog Database Server Endpoint, then provide Admin Username and Admin Password. At the end Select Catalog Database Service Tier.
- Click the Test connection button at the bottom. If everything is in order, then Connection successful message should appear.
- Click the Next button.
- On the last screen, you can set Maximum Parallel Executions Per Node, set value to 1 and click Finish.
Maximum Parallel Executions Per Node
You can define here if you want to run a single package using more than one core or run one/more packages in a single core.
The checkbox with VNet is required when you have on-premises data sources or destinations in your package logic.
- All Azure-SSIS IRs you created should appear on the Connections / Integration Runtimes tab.
- You can monitor your IR environment by clicking the Monitor button.
- On the Monitor page, you can see basic information about Integration Runtime, as well details about Nodes.
SSIS Package deployment to Azure-SSIS
In this section, I will give you a very short explanation of how to deploy a SSIS package to Azure-SSIS Integration Runtime.
- First of all, develop your SSIS package using SQL Server Data Tools. Your package may execute a stored procedure in the Azure SQL Database or copy data between SQL Servers hosted on Azure VM.
- Next, open SQL Server Management Studio and connect to Azure SQL Server instance where SSISDB was deployed. Here you need to connect to SSISDB directly. Otherwise, you will not be able to deploy SSIS package, because Object Explorer will not display Integration Services Catalogs. To do it, click the Options button, switch to Connection Properties tab and type SSISDB.
- In Object Explorer you can notice the Integration Services Catalogs folder.
- The next step is to create a folder within the SSISDB catalog. This is the place where we will deploy SSIS packages. To do it, right-click on SSISDB and select Create Folder. Enter the folder name (e.g. MySSIS) and click the OK button.
- In order to deploy an SSIS package on Azure-SSIS expand the created folder, which in my case is MySSIS, right-click on the Projects folder and select Import Packages.
- You will see the Integration Services Deployment Wizard. At this point, we can finish our guide. If you need a detailed description on how to deploy a package using the wizard, you can find it here.
- After successful package deployment, you need to run it. You will find details on how to do it here.
Congratulations! You’ve deployed and ran an SSIS package in ADF V2 Integration Runtime.
In ADF V2 you can run your pipeline in two ways, i.e. on-demand or with the use of Triggers.
We can distinguish two types of triggers:
- Schedule – it runs the pipeline on wall-clock schedule. It provides advanced recurrence options with granularity level to one minute. A pipeline attached to a Schedule trigger will be run according to first execution time in the future. This type discards past occurrences.
It has a many-to-many relationship with the pipeline. This means that one trigger can be related to many pipelines, and many triggers can be related to one pipeline. A very detailed description of the Schedule trigger can be found here.
- Tumbling window – execution windows are fixed-size and do not overlap. It operates at periodic intervals and keeps its state. Additionally, tumbling windows cannot be shorter than 15 minutes.
It has a one-to-one relationship with the pipeline. This means that it can only relate to a single pipeline. Furthermore, it contains properties well-known from ADF V1 like Delay, Max concurrency, Retry Policy Count and Intervals in seconds. More information can be found in the official documentation.
Basically, if you need to process time-related data, especially containing tumbling windows in the past, then use Tumbling Window trigger. The mechanism is similar to current data processing (i.e. data slices) in ADF V1.
Create your own trigger
- Open Visual Tools and go to Triggers, then click the New button.
- New Trigger tab will appear. In this scenario, we will create a Schedule trigger.
- Select Type: Schedule checkbox.
- Set Start Date, e.g. 1st June 2018 at 10:00 AM.
- Set Recurrence options, i.e. days, hours and minutes at which the pipeline should run.
- Precise End when trigger should stop (or No End, if it shouldn’t).
- Select the Activated checkbox. It means that after publication, the trigger will start scheduling the pipeline’s executions. Otherwise, nothing will happen.
- Click the Finish button.
Attach Trigger to pipeline
The last step is to schedule pipeline execution. You have to publish the created trigger and attach it to the specific pipeline.
- Open the pipeline you want to schedule in Visual Tools.
- Click the Trigger button and then New/Edit.
- From Add Triggers tab select the trigger and click the Next button.
- In the next step you can adjust the execution schedule; if everything is fine then click Next.
- Finally, you can provide pipeline parameters. When done, click the Finish button.
- Now publish your pipeline. Done!
That is all. Now your pipeline will be executed according to the specified trigger schedule. You can use the Monitor feature to track pipeline’s execution. To do this, select Trigger Runs from the drop-down menu.
In my last article in the Azure Data Factory V2 series, I covered two features which are Integration Runtime (specifically Azure-SSIS) and Triggers. Now you have basic knowledge of how to set up SSIS infrastructure in Azure and to use it for deployment and running of SSIS packages. Additionally, with Triggers, you gained the possibility to schedule pipeline executions at specific date intervals.
We reached the end of series related to ADF V2. I encourage you to read my previous posts if you haven’t had a chance yet. I have provided an overview of new features in the preview, posted a guide to Visual Tools and talked about conditional execution functions and parameters.
The Azure Data Factory topic is much wider, and my intention was to highlight the most interesting features which appeared in V2 preview. For an in-depth description, I recommend official Microsoft documentation where you will find answers to many questions. You can also contact me with any queries!