Our Talking About R… series continues! Today we are focusing on integrating R with Azure Machine Learning. See how you can use R programming in Azure Machine Learning for solving a variety of data modelling issues, sales analysis, text analysis, or forecasting.
Azure Machine Learning is a fully cloud-based service, used to gather the magic flowing from your data. The human eye is unable to extract non-obvious patterns from data, especially if the number of rows in the table exceeds the measurable time – welcome to the big data world where R programming is an inseparable element of data analysis.
Let’s try it – it’s free! 🙂
Possible ways to work with R in Azure Machine Learning
R in Machine Learning can be used in 3 ways:
- To create pre-processing scripts
- To build your own models with training and evaluation
- To work from an IDE using the AzureML library
I will explain each of these methodologies shortly. It is very important when building an analytical solution in Azure Machine Learning, because R can extend capabilities of the service. Azure Machine Learning has a collection of functionalities and a strictly limited list of models – that’s why R is needed!
When forecasting time series data or making your own recommender system, using R script is crucial for implementing an appropriate machine learning model.
Keep in mind:
Azure ML hasn’t got models for forecasting time series or building recommendations based on association rules – you have to use R script and make your own model.
The most commonly used R component in Azure Machine Learning is probably R Script.
Using R Script we are able to implement a wide range of data processing capabilities available in R. However, you should keep in mind that R Script also has a limited number of libraries available.
R Script in Azure ML is mainly used for data cleaning methods where a more complex processing logic is required. Moreover, you can generate statistics or data visualization while running an experiment, because R Script can also return a graph which we can access from the R Device.
We can therefore use it for:
- Pre-processing or post-processing of data
- Visualization of data during the experiment
- Cleaning up your data
- Generating statistics
It is possible to create additional models in Azure Machine Learning using R Models. It strengthens R’s position relative to Python, because there is only one Python script component in Azure Machine Learning.
The Create R Models component contains two elements that the user fills in with R code:
- Trainer R Script – this is the window where we paste the R code responsible for model training and definition. The code will handle input with the assembly on which the model will learn.
- Scorer R Script – this part of the code is responsible for testing the model and checking its effectiveness. It imports the second input so that it can operate on data not previously known to the model – this allows for evaluation and testing of the model’s accuracy.
There are some restrictions with using R:
- Major constraints arise with unsupervised methods – a simple example may be training of the a priori model used for finding association rules
- Note the distribution of R used by Azure ML environment – not all R packages are supported by the classic CRAN distribution. Many packages require the use of Microsoft R Open.
You can easily check the list of supported packages by calling the following command:
data.set <- data.frame(installed.packages()) maml.mapOutputPort("data.set")
These are pre-loaded packages, so referencing them only requires entering library() command in the R Script with the package name.
On the msdn website dedicated to the service, we can find the following information about the limitations of the packages:
“A number of packages are included in the Azure Machine Learning environment but cannot be called from R code because of the following issues:
- The package has a Java dependency.
- The package binaries are not compatible with the sandboxed Azure environment.
- The package requires direct Internet access, or network access.”
The list of packages supported by Azure Machine Learning is listed on the official website under this link.
AzureML package in R
This specific package has been created especially for the operationalization of the Azure Machine Learning service. It allows you to execute a lot of useful functions, such as:
- Listing, downloading, uploading and deleting datasets in Azure ML workspace
- Reading intermediate data directly from experiments
- Allowing for a concise way of publishing and consuming web services hosted on Azure.
Before you start the journey with AzureML package you should be sure that your environment is ready and that you have your own account on the Azure Machine Learning Service. The second thing – it is very important to ensure that you have a zip utility on your system.
If you encounter the error Requires external zip utility. Please install zip, ensure it’s on your path and try again, do the following:
- Install RTools for Windows.
- Add the install directory to the system path. For example, if it’s installed in C:\Tools, you should add C:\Tools\bin to your system path and then restart R.
A very good description of the package with examples is found in the CRAN repository – the document is created in RMarkDown and is very user-friendly.
For more information, check out this link. If you need more explanation about AzureML functions or configurations, let me know!
As we’ve learnt from the Microsoft Ignite conference sessions, there are many improvements or add-ons that will make this tool even more useful.
“The Azure Machine Learning Experimentation service allows developers and data scientists to increase their rate of experimentation. With every project backed by a Git repository, and with a simple command line tool for managing experimentation and training runs, every execution can track the code, configuration, and data that’s used for the run. More importantly, the outputs of that experiment, from model files, log output, and key metrics are tracked, giving you a powerful repository with the history of how your model evolves over time. (…)
Model management service provides deployment, hosting, versioning, management, and monitoring for models in Azure, on-premises, and to IOT Edge devices. (…)
Azure Machine Learning Workbench with AI powered data wrangling is a client application that runs on Windows and Macs. It has an easy set-up and installation and will install a configured Python environment, complete with conda, Jupyter, and more, along with connectivity to all of the backend services in Azure.”
“Visual Studio Code Tools for AI is an extension to build, test, and deploy Deep Learning / AI solutions. It seamlessly integrates with Azure Machine Learning for robust experimentation capabilities, including but not limited to submitting data preparation and model training jobs transparently to different compute targets. Additionally, it provides support for custom metrics and run history tracking, enabling data science reproducibility and auditing. Enterprise ready collaboration, allow to securely work on project with other people.” (Source)
All of the above capabilities are detailed in this article.
The key things you need to know are:
- Azure Machine Learning has two components where you can use R – R Script and R Model
- The training model component is focused on supervised ML methods
- If you need to operationalize Azure Machine Learning service through external IDE you can use the Azure ML package
- Azure Machine Learning has a limited number of packages – that is something you should consider before moving your R code to Azure Machine Learning.
Azure ML is a user-friendly interface for conducting Machine Learning experiments. It is an extensible environment using R scripts or Python. Do not waste your time and try it out!
I look forward to your feedback! 🙂 Get in touch now.