Introduction
The best way that you can actually retain some knowledge is to practice what you’re learning. One can spend only so much time reading documentation and watching videos but, as some point, you need to put that knowledge in practice.
In this post I will guide you through what I would set up for myself if I had to start again from the beginning. This will require minimal knowledge about Microsoft Azure and its components meaning that this guide is intended for all levels of readers.
Table of Contents
Disclaimer
Azure Databricks is a paid service which is not included in the free tier of Microsoft Azure. If you are no longer interested in using Azure Databricks read the Clean up section.
Microsoft Azure Account
In order to have an Azure Databricks workspace you need an Azure Account. Microsoft Azure offers to new users a free account. The account includes more than 55 always free services plus a plethora of premium services that are free for the first 12 months of use of the account.
Unfortunately Azure Databricks is not one of them, but not all is lost.
Your free account includes as well a generous 200$ credit that you can spend in the first 30 days.
Here is the link to Microsoft Azure’s website where you can sign-up for your free account: https://azure.microsoft.com/en-us/free/search/. The process is completely painless and will take you only few minutes.
My advice: use the Microsoft Azure free account wisely and learn as much as possible.
Create the resource group
rg-dbr-d-we-001
where:
rg
: resource groupdbr
: Databricksd
: developmentwe
: west europe001
: instanceIf everything checks out, press create. Microsoft Azure will provision a resource group with the settings of your choice.
Accessing the resource groups can be done from the left menu. Once retrieve, just click on the resource group name to access its components.
Azure Databricks
In order to add an Azure Databricks to the resource group we need first to hit the create button to add a new resource.
lab-dbr-we-001
If everything checks out, you can proceed and click on the button create, which should lead to the deployment of the resource in your desired resource group.
Be aware that the deployment of Azure Databricks may take some minutes. When successful you will encounter the following screenshots.
When prompted to choose what’s your current data project, you can take your own pick. For the purpose of this guide I have chosen Exploring Data (Python, R).
Click on Finish and your Azure Databricks workspace is finally ready to be used.
Create a cluster
In Azure Databricks, a cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning[2]. You can run these workloads as a set of commands in a notebook or as an automated job[2].
A cluster in can be thought of as a virtual machine that is specifically configured to run Spark applications. When you create a cluster, you can specify the number and type of virtual machines that should be used, as well as other configuration options such as the version of Spark to use, the amount of memory and CPU to allocate to each node, and the types of storage to use.
As this guide is meant just to get the reader started with Azure Databricks I will not go through all the configurations that are possible for a cluster. More information can be found in the Microsoft Learn documentation[2].
Create a notebook and run code: Hello, World!
print('Hello, World!')
shift+enter
to execute a cellIf for some reason the cluster is detached or terminated, when executing the cell you will be prompted to attach it to a compute resource.
Congratulations, you just ran your first line of code on Azure Databricks.
Cleanup
Having Microsoft Azure resources, even in a free account, will incur in some costs at a certain point of time. If you are not interested anymore in using Azure Databricks you can proceed with the following steps which will explain you on how to delete the workspace.
Get on the resource group navigating the menu in the Azure Portal, then:
1. Select the Azure Databricks Workspace by ticking the checkbox
2. Click on Delete
3. Enter delete in the text box and press Delete
This will initiate the deletion process of the resource and will not generate any further costs.
References
4 Comments on “Getting started with Azure Databricks”
Comments are closed.