{"id":451,"date":"2022-11-03T13:00:42","date_gmt":"2022-11-03T13:00:42","guid":{"rendered":"https:\/\/pc-keeper.tech\/index.php\/2022\/11\/03\/azure-databricks-architecture-intro-ieee-computer-society\/"},"modified":"2022-11-03T13:00:42","modified_gmt":"2022-11-03T13:00:42","slug":"azure-databricks-architecture-intro-ieee-computer-society","status":"publish","type":"post","link":"https:\/\/pc-keeper.tech\/index.php\/2022\/11\/03\/azure-databricks-architecture-intro-ieee-computer-society\/","title":{"rendered":"Azure Databricks&#8217; Architecture Intro | IEEE Computer Society"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-309423 img-responsive alignright\" src=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191142\/Intro-to-Azure-Databricks.jpg\" alt=\"Intro to Azure Databricks\" width=\"250\" height=\"250\" srcset=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191142\/Intro-to-Azure-Databricks.jpg 250w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191142\/Intro-to-Azure-Databricks-150x150.jpg 150w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191142\/Intro-to-Azure-Databricks-100x100.jpg 100w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\"\/>Big data has become the main driver of insight across many industries. All that data isn\u2019t much use without a way to analyze it, though. This has led to developing frameworks like Apache Spark to handle the load.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Those frameworks need to be managed and made accessible to data analysts. As such, management platforms like Databricks for industry have emerged. Essentially, this allows data specialists to work with multiple instances of Spark across cloud services like Azure. It might sound complex if you have never come across Databricks or Spark. This article will cover what Azure Databricks does and how you can use it for your big data needs.<\/p>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">What Is Apache Spark?<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">First, we need to talk about Apache Spark. It\u2019s this framework that underpins Databricks\u2019 primary functions. Spark is an open-source cluster computing solution. This means it uses networks of computers to perform simultaneous processing of large datasets.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Spark does this all \u201cin-memory\u201d meaning it uses the RAM of the networked machines as opposed to reading\/writing to a disk. This framework is a highly efficient big data processing solution, but it needs a management layer for ease of use. That\u2019s where Databricks comes in.<\/p>\n<p>\u00a0<\/p>\n<hr style=\"width: 100%;\"\/>\n<p>\u00a0<\/p>\n<p style=\"text-align: center; color: #ff6600;\"><strong>Want More Tech News? Subscribe to <i>ComputingEdge<\/i> Newsletter Today!<\/strong><\/p>\n<p>\u00a0<\/p>\n<hr style=\"width: 100%;\"\/>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">What Is Azure Databricks?<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Databricks is a data analytics program that acts as a management layer for Spark. Azure Databricks is optimized for use with Microsoft\u2019s Azure cloud platform. In short, Azure Databricks uses cluster computing to unify data functions across the Azure platform.<\/p>\n<figure id=\"attachment_309424\" aria-describedby=\"caption-attachment-309424\" style=\"width: 300px\" class=\"wp-caption alignright\"><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-309424 img-responsive\" src=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191720\/Introduction-to-Azure-Databricks-Architecture-1-300x161.png\" alt=\"Azure Databricks integrated management\" width=\"300\" height=\"161\" srcset=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191720\/Introduction-to-Azure-Databricks-Architecture-1-300x161.png 300w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191720\/Introduction-to-Azure-Databricks-Architecture-1.png 512w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\"\/><figcaption id=\"caption-attachment-309424\" class=\"wp-caption-text\">Source<\/figcaption><\/figure>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The Azure version of Databricks runs optimized Spark APIs. It uses the computing power of the Azure cloud network for cluster processing. On top of this, it integrates functions from across the Azure platform, such as Data Lake Storage, Power Bi, and Azure Machine Learning.<\/p>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">Why Use Azure Databricks?<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The integration of Azure services and support for multiple programming languages has made Azure Databricks a popular choice. It\u2019s a highly versatile solution that supports Scala, R, SQL, and Python.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Collaborative Platform<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The Databricks Workspace allows data specialists to work with shared dashboards and notepads. Quickly share insights and analysis models to improve data workflows, build new ideas, and optimize data analyst training.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Optimized Runtime Applications<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">As well as the optimized Spark APIs, Databricks Runtime also includes performance and security optimizations for all components. These are regularly updated with new versions. The dashboard lets you auto-scale processing tasks, among other quality-of-life functions.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Integrations<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The integrated functions of Azure Databricks make it an all-in-one solution for data analytics and machine learning. Your data lake can be managed and expanded with Azure blob storage, Azure data factory, etc. Your analytics can be fed into Power Bi and machine learning pipelines.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Insights can be easily pushed to the management layer. Integrated security protocols manage directories and sign-on. The end-to-end applications make Azure and Databricks an ideal business solution.<\/p>\n<figure id=\"attachment_309425\" aria-describedby=\"caption-attachment-309425\" style=\"width: 300px\" class=\"wp-caption alignright\"><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-309425 img-responsive\" src=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191813\/Introduction-to-Azure-Databricks-Architecture-2-300x200.png\" alt=\"\" width=\"300\" height=\"200\" srcset=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191813\/Introduction-to-Azure-Databricks-Architecture-2-300x200.png 300w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191813\/Introduction-to-Azure-Databricks-Architecture-2.png 512w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\"\/><figcaption id=\"caption-attachment-309425\" class=\"wp-caption-text\">Source<\/figcaption><\/figure>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">Databricks Components Explained<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">These are the core components that make up the Databricks platform.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Managed Clusters<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">It is the function that powers your processing. The cluster shares the workload to complete the processing task quickly. With Azure Databricks, you can set up a cluster with a few clicks.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">It allows for on-demand processing. You can establish automated job groups to create a cluster for specific tasks. These groups automatically start up and shut down, ensuring that processing costs are kept to a minimum.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Spark &amp; Delta<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">As mentioned above, Spark is the engine that processes your data in memory. Delta is an open-source file format that was designed to address the limitations of traditional data categorization.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Working together, these two open-source components optimize data sorting and processing. It gives Databricks the processing speed required for big data workflows.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">ML Flow<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The ML Flow open-source machine learning framework is the backend of Databricks\u2019 ML workflow. ML Flow itself, is made up of the components you can see in the flowchart below.<\/p>\n<figure id=\"attachment_309426\" aria-describedby=\"caption-attachment-309426\" style=\"width: 300px\" class=\"wp-caption alignright\"><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-309426 img-responsive\" src=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191852\/Introduction-to-Azure-Databricks-Architecture-3-300x168.png\" alt=\"ml flow components\" width=\"300\" height=\"168\" srcset=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191852\/Introduction-to-Azure-Databricks-Architecture-3-300x168.png 300w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/10\/31191852\/Introduction-to-Azure-Databricks-Architecture-3.png 512w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\"\/><figcaption id=\"caption-attachment-309426\" class=\"wp-caption-text\">Source<\/figcaption><\/figure>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Using the collaborative workspace in Databricks, ML developers can track and run projects. They can execute ML runs as jobs in Databricks and run in engine tests as seamless dashboard functions.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">SQL Endpoints<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">SQL analytics in Databricks is powered by SQL endpoints. These are Spark clusters optimized for SQL processing. SQL analysts can access an SQL dashboard by switching views in the main Databricks UI.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">It lets SQL specialists run queries against your data lake and share work on SQL dashboards. The integration of business intelligence tools allows you to access these endpoints through Power BI, Tableau, and others.<\/p>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">Use Cases for Azure Databricks<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Databricks isn\u2019t a catch-all solution for every business scenario. These are the best use cases for Azure Databricks. If your business fits into one of these descriptions, then it might be the solution for you.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Database &amp; Mainframe Modernization<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Data storage, collection, and processing are incredibly important in modern business. If you\u2019re looking to modernize your data lakes or looking into mainframe modernization applications, then Azure Databricks has the integrations you need.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Machine Learning Production Pipeline<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Using the underlying power of ML Flow, Databricks is a good choice if you need to get machine learning applications into production. Getting data science out of development and into production is a common problem, and Databricks can help streamline that workflow.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Big Data Processing<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Azure Databricks is one of the most cost-effective options for big data processing. In terms of performance vs. cost, it offers high efficiency. If your business needs the best performance for on-demand data processing, then Databricks will likely be your best choice.<\/p>\n<p>\u00a0<\/p>\n<h3 style=\"color: #002855; font-size: 20px; font-family: Montserrat; font-weight: 500; line-height: 24px;\">Business Intelligence Integration<\/h3>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Integrating Business Intelligence tools means you can open your data lake to analysts and engineers more easily. There\u2019s no need for the creation of new pipelines when analysts need access to new data.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">The data can be shared through SQL analytics, Power BI, and Tableau. If this is a bottleneck for your business, then Databricks will help enable your Business intelligence teams.<\/p>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">Final Thoughts<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Data science and data technology advance quickly. While some businesses are still struggling with questions like what is IVR, others are using cloud computing and big data analysis to optimize their operations.<\/p>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\">Modernization can be an intimidating process for businesses with established infrastructure. Yet, programs like Azure Databricks are making it easier to modernize legacy systems. We hope this guide explains whether Databricks is the best choice for your modernization.<\/p>\n<p>\u00a0<\/p>\n<h2 style=\"color: #002855; font-size: 24px; font-family: Montserrat; font-weight: 500; line-height: 29px;\">About the Writer<\/h2>\n<hr style=\"text-align: left; width: 30%; height: 3px; color: #ffa300; background-color: #ffa300; border: none;\"\/>\n<p style=\"color: #454545; font-size: 18px; font-family: Open Sans; font-weight: 400; line-height: 1.7em;\"><img decoding=\"async\" loading=\"lazy\" class=\"img-responsive alignleft wp-image-283798 size-thumbnail\" src=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/06\/22000948\/pohan-lin-headshot-150x150.jpg\" alt=\"Pohan Lin\" width=\"150\" height=\"150\" srcset=\"https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/06\/22000948\/pohan-lin-headshot-150x150.jpg 150w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/06\/22000948\/pohan-lin-headshot-300x300.jpg 300w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/06\/22000948\/pohan-lin-headshot-100x100.jpg 100w, https:\/\/ieeecs-media.computer.org\/wp-media\/2022\/06\/22000948\/pohan-lin-headshot.jpg 400w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\"\/>Pohan Lin is the Senior Web Marketing and Localizations Manager at Databricks, a global Data and AI provider connecting the features of data warehouses and data lakes to create lakehouse architecture. With over 18 years of experience in analytics machine learning, web marketing, online SaaS business, and ecommerce growth. Pohan is passionate about innovation and is dedicated to communicating the significant impact data has in marketing. Pohan Lin also published articles for domains such as PingPlotter.<\/p>\n<\/p><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.computer.org\/publications\/tech-news\/trends\/azure-databricks-architecture-intro\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Big data has become the main driver of insight across many industries. All that data isn\u2019t much use without&hellip;<\/p>\n","protected":false},"author":1,"featured_media":452,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[211,392,66,393,2],"tags":[],"class_list":["post-451","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-apache-spark","category-azure","category-cloud","category-databricks","category-tech-news-post"],"_links":{"self":[{"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/posts\/451","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/comments?post=451"}],"version-history":[{"count":0,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/posts\/451\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/media\/452"}],"wp:attachment":[{"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/media?parent=451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/categories?post=451"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pc-keeper.tech\/index.php\/wp-json\/wp\/v2\/tags?post=451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}