Azure Monitor Design Patterns

Tiago Dias Generoso
Dev Genius
Published in
8 min readMar 26, 2024

--

Introduction:

In this article, our primary objective is to illuminate the crucial details essential for crafting a robust monitoring design that maximizes the capabilities of Azure Monitor while adhering to best practices. Whether you’re a seasoned IT professional or a novice, this guide aims to provide actionable insights to bolster your monitoring endeavors.

Foundational Concepts:

Before delving into the depths of monitoring design, let’s first establish some foundational concepts. Just like any monitoring system, Azure Monitor thrives on data from various sources such as metrics and logs.

These data streams serve as the lifeblood, fueling insights, reports, dashboards, detailed analyses, alert responses, and seamless integration with external sources or other Azure components.

Some important components:

Azure Monitor is a comprehensive monitoring solution for Azure resources. It provides insights into the performance, availability, and health of your applications and infrastructure.

Metrics and Logs: Azure Monitor collects metrics and logs from various Azure services, virtual machines, and applications.

Alerts and Dashboards: You can set up alerts based on thresholds, create custom dashboards, and visualize data.

Application Insights: Part of Azure Monitor, it focuses on application performance monitoring (APM) for web applications.

Container Insights: A feature of Azure Monitor that collects and analyzes container logs from Azure Kubernetes clusters or Azure Arc-enabled Kubernetes clusters and their components.

VM Insights: A feature within Azure Monitor that specifically monitors Azure virtual machines (VMs) and provides performance and health insights.

The Azure Monitoring Agent (AMA) is a unified agent that replaces the legacy Log Analytics agent. AMA works with DCRs to collect data from various sources (e.g., VMs, containers, and more).

A Log Analytics Workspace is a central repository for log data collected from various sources.It stores logs, metrics, and other telemetry data. You can use KQL (Kusto Query Language) to query and analyze data within the workspace.

A Data Collection Rule defines how data is collected and processed by Azure Monitor. DCRs specify which data sources to collect from, how often, and where to store the data. You can create custom DCRs to tailor data collection to your specific needs.

Configuration Tracker refers to monitoring and tracking changes to resource configurations. It helps track modifications to settings, access controls, and configurations.

Architecture design:

When embarking on the journey of architecture design, it’s pivotal to concentrate on three core components that form the backbone of a robust and scalable system. Let’s delve into these components keeping Azure Well-Architected Framework in mind to ensure our design is not only resilient but also cost-effective and operationally efficient.

1 — Log Analytics Workspace
2 — Data Collection Rules
3 — Alert Rule Scope

Log Analytics Workspace

Building on our previous discussion, a Log Analytics workspace acts as a data repository. You can choose between a central repository or create multiple workspaces based on your specific needs. To optimize costs, careful design is crucial. This means striking a balance between two key aspects of the Well-Architected Framework: cost optimization and operational efficiency

  • Single Workspace: Beginning with a singular workspace simplifies management and querying. There are no performance constraints tied to data volume within a workspace; however, latency may occur if regions are distant from one another. Additionally, compliance considerations play a role; certain companies mandate local data storage, making a centralized workspace unsuitable for their needs.
  • Multiple Workspaces: Separate workspaces should be considered if there is a requirement to segregate operational and security data, adhere to regional regulations, or optimize costs based on usage patterns. However, this approach demands more operational effort. With data dispersed across multiple workspaces, achieving discounts becomes challenging. Furthermore, maintaining multiple workspaces necessitates duplicating data collection rules since Data Collection Rules (DCRs) are tied to each log analytics workspace.

Factors that can influence the decision:

  • Dev / Prod Environment: You may want to keep the data separated for Dev and Prod environments
  • Latency: Transmission of logs across distant regions can introduce latency due to network communication.
  • Data Residency: The workspace is inherently associated with a specific Azure region. It’s imperative to ensure alignment between the workspace’s region and your data residency requirements.
  • Compliance: Different regions may adhere to distinct compliance regulations. It’s essential to understand the legal and regulatory implications of collecting data from diverse geographies.

Consider using multiple workspaces if:

  • You need to segregate data for operational and security purposes.
  • Your company is subject to regional regulations that require local data storage.
  • You want to optimize costs based on varying data usage patterns across regions.

Opt for a single workspace if:

  • Simplicity in management and querying is a priority.
  • Data latency across regions is not a major concern.
  • Compliance allows for centralized data storage.

Log Analytics Workspace RBAC Design:

Azure logs and metrics offer a straightforward method for controlling access to data. We can leverage the existing permissions assigned to resources, meaning that if someone already has access to a particular resource group or virtual machine, they automatically gain access to the metrics and logs within the workspace. Additionally, permissions can be granularly managed at the Log Analytics Workspace level.

In designing your Role-Based Access Control (RBAC) strategy, it’s advisable to combine these two options: resource-context access and workspace-context access. This approach streamlines the design process and addresses all requirements. However, caution must be exercised with workspace-context access to prevent granting permissions to individuals who lack authorization to view the data.

Data Collection Rules

While Data Collection Rules (DCRs) facilitate data gathering from resources distributed across various geographical locations, it’s crucial to consider the following:

  • Centralized vs. Distributed DCRs: DCRs are associated with Log Analytics workspaces. If you have a central workspace, your DCRs will also be centralized. However, you can have multiple workspaces and DCRs for a more distributed approach.
  • Multiple DCRs per VM: A single virtual machine (VM) can use multiple DCRs. These DCRs collect data and send it to different tables within your Log Analytics workspace, as shown in the following diagram.
  • DCR Best Practices: Microsoft recommends creating separate DCRs for different purposes, such as monitoring performance, tracking configuration changes, and collecting insights from specific platforms (e.g., Windows, Linux). This approach helps organize your data and simplify management.

The included diagram illustrates how DCRs collect data and store it in specific tables within your Log Analytics workspace.

The image showcases four common DCR types:

  • Windows Performance
  • Linux Performance
  • Configuration Change Tracking
  • VM Insights (applicable to both Windows and Linux)

Each DCR populates its corresponding table in the workspace. We’ll delve into these tables in more detail:

  • Perf (Performance Metrics): The Perf table within Azure Monitor Logs stores performance metrics data. These metrics originate from diverse sources, including virtual machines, applications, and services.
  • InsightsMetrics: This table is closely tied to Azure Monitor for VMs and the VM Insights Solution. It provides detailed metrics specific to virtual machines.
  • Event: The Event table captures a wide range of events and activities across Azure resources.
  • ConfigurationChange: The ConfigurationChange table records configuration changes made to Azure resources.
  • Logs: The Logs table is a fundamental component of Azure Monitor. It serves as a repository for various types of log data generated by Azure resources, applications, and services. This table captures essential information related to system events, diagnostics, and operation

Costs details

The associated costs for Azure Monitor can be found on this link: https://azure.microsoft.com/en-gb/pricing/details/monitor/

  1. No cost to create new Analytics Workspace
  2. Data Injection Costs
  3. Retention Period
  4. Region can be important due to egress costs
  5. Alert frequency have impacts on alerting costs

Alert Rule Scope Design

A critical aspect of implementing alert rules is choosing the appropriate scope. This decision impacts factors like the number of rules you need, operational overhead, and configuration flexibility.

*Due to Azure limitation we cannot use higher level (Resource Group, Subscription) scope on some type of alert rules, just resource level. The Custom log alert rule type allow us to use the higher level scope.

  • Higher Levels: Scoping alerts at higher levels (subscription or resource group) reduces costs, resource consumption, and configuration complexity. However, it also decreases flexibility. Modifying an alert for a single VM or cluster becomes more challenging.
  • Resource Level: Scoping alerts at the resource level offers the most granular control. You can tailor alerts to specific VMs or clusters. However, this approach can lead to a significant number of rules, potentially increasing management overhead.

We will explore three common scoping levels in more detail:

  • Resource Level
  • Resource Group Level
  • Subscription Level

Resource Level:
Pros:
-
Granular control: Alerts trigger only for the specific resource’s metrics or health.
- Easier troubleshooting: Identify issues quickly by pinpointing the exact resource causing the alert.

Cons:
-
Can become cumbersome: Managing a large number of individual resources with separate rules can be time-consuming.
- Redundancy: Similar rules for multiple resources lead to duplicated configurations.

Resource Group Level:
Pros:
-
Balanced approach: Groups related resources together, reducing the number of individual rules.
- Easier management: Simplifies monitoring for a group of resources with similar behavior.

Cons:
-
Less granular: Alerts might not pinpoint the exact resource causing the issue within the group.
- Potential for missed alerts: If a single resource’s issue doesn’t affect the entire group, the alert might not trigger.

Subscription Level:
Pros:
-
Centralized management: All resources within the subscription are covered by a single rule.
- Efficient for high-level monitoring: Useful for general health checks across all resources.

Cons:
-
Least granular: Offers no identification of the specific resource causing a problem.
- Potential noise: Alerts might be triggered for irrelevant resources, causing information overload.

One good recommendation is to have the alert rules on different levels, on higher levels the alert rules that will be a standard that will not depend on components particularities, in other words, the ones you dont want customizations, and going down you can put the ones that can require more customizations.

Conclusion

In conclusion, crafting a robust monitoring design within Azure necessitates a meticulous blend of technical acumen and strategic foresight. By adhering to best practices and leveraging the myriad capabilities of Azure Monitor, organizations can navigate the complexities of modern IT landscapes with confidence.

Tiago Dias Generoso is a Distinguished IT Architect | Senior SRE | Master Inventor based in Pocos de Caldas, Brazil. The above article is personal and does not necessarily represent the employer’s positions, strategies or opinions.

--

--

Distinguished IT Architect | Senior SRE specialized in Observability with 20+ years of experience helping organizations strategize complex IT solutions. Kyndryl