Dynatrace Solution Design for complex IT environments
The design of an observability / monitoring solution is a complex (and exciting) journey because it needs to involve a culture change. An Observability Plan can provide a revolution to IT Operations, and at the same time, improve the reliability of the applications and save cost and support efforts.
Multiple mistakes can be made during the discovery phase where the companies need to find the monitoring tool that will support their Observability Strategy, generating frustration. This generally happens because they did not put enough effortand time into a good Proof of Concept (POC) to determine the appropriate architectural design.
The target of this article is to provide some tips on the architectural design of a well known monitoring product known as Dynatrace, a Gartner Magic Quadrant Leader. Here are some tips to support you to avoid basic mistakes, and important architecture decisions you should consider on your design.
1- Requirements Gathering
First of all you need to cover the functional requirements to allow you to identify if the tool will be able to satisfy all required capabilities, such as operating systems, platforms, middleware and languages. You also need to identify which type of monitoring the tool needs to provide to satisfy all those functional requirements, for example, infrastructure monitoring, application performance management (APM), real user monitoring (browser monitoring), Synthetics, etc.
The secondly identify the non-functional requirements, to understand if the customer requires any kind of special security control, the availability SLA, high availability requirements, disaster and recovery and so on.
Lastly, and most importantly is to understand the customer objectives, the customer strategy and the operational model the customer needs to implement. How the company is organized, if they are using Agile, self-sufficient squads, SRE methodology, if they are on Cloud or planning to go to Cloud, if they have a single infrastructure team, or multiple infrastructure teams.
2 — Solution Design
With a deep understanding of the customer requirements, strategy and operational model, you will be able to make good decisions and I will provide here some tips on some different scenarios.
2.1 — SaaS or On-premise
With SaaS versions you don’t need to worry about the main Dynatrace cluster, backup, disaster and recovery. You only need to distribute the correct number of ActiveGates throughout the company’s network zones.
With the On-Premise solution you also need to manage the Dynatrace Cluster, upgrades, operating system maintenance and so on. Typically the company doesn’t have additional resources to support this extra infrastructure.
The recommendation is to try to use SaaS version as must as possible and only use on-premise if the company has security policies (eg. ITAR — International Traffic in Arms Regulation) that do not accept the monitoring solution on SaaS version.
2.2 — Full stack or Infra Only
Dynatrace can offer two different ways to provide monitoring through the Dynatrace OneAgent, and the significant factor in choosing one or another is the cost. Generally, Infra Only will cost only 30% of the total price if you use full-stack monitoring. Keep in mind that we can select either agent through Dynatrace UI.
Infra only will provide infrastructure monitoring and will not cover the application performance management (APM) and, therefore, will be unable to generate the Smartscape Topology that can help to correlate infrastructure and application problems.
Full-Stack monitoring will cover end-to-end monitoring, from the infrastructure to the application. It will correlate infrastructure and application monitoring, generating the service mapping where you can see the transactions end to end and understand that a problem on an application is related to an infrastructure problem, significantly improving the root cause analysis.
As I mentioned, you can decide agent by agent, and for sure, the only reason to choose to use the infra-only agent is if, for some specific servers, the cost is more critical than end-to-end monitoring.
2.3 — Network distribution
Usually big companies have complex network infrastructure and monitoring solutions need to reach the whole environment. Dynatrace has a component called ActiveGate, that supports us to provide monitoring for these complex environments.
Dynatrace has variations of these AG:
Routing ActiveGates — Is used to centralize the OneAgent connection and is recommended to have one on each sub-net with firewall or networks with high latency. All OneAgents will try to connect on all available routing AG, to avoid you can implement Dynatrace Network Zones, which will limit the OneAgents to try to connect only on specific AGs on their network environment.
Synthetic ActiveGates — Is used to perform synthetic monitoring (artificial transactions). The AG will simulate a user from locations of your preference. For example, if the company has users from different countries and you need to see the performance of an application from one or another country, you need to create one ActiveGate per each country you need the tests.
Zremote ActiveGate — Is used to perform monitoring for IBM Z systems, for example to provide monitoring for CICS, IMS applications mapping all the transactions end to end. The number and size of these Zremotes will depend on the number of MIPS consumed by each IBM Z Lpar, but generally, in average, we cannot have more than 10 Lpars connect on each of those Zremotes.
Plugin ActiveGate — Is used to provide some specific monitoring requirements for example external access or where the OneAgent doesn’t do natively.
It is really important to implement an ActiveGates strategy to avoid high costs of hosting to implement the solution.
2.4 — Multi-tenant x Single tenant
Dynatrace provides ways to manage access of the tool, to be sure the right people will access the right data.
Environment Component — This component is responsible to provide the multi tenancy, each ‘Environment’ is a totally isolated instance with their own configurations, integrations, standards, user and so on. But we need to keep in mind the usage of multiple ‘Environment’ will not allow the end to end correlations between the components from each ‘Environment’, So, if you want to create a shared environment but with the same infrastructure you will break the PurePath (end to end correlation and Smartscape Topology).
Management Zones — Is a Dynatrace component that enables fine grained access control on Dynatrace. You can design the Management zones for example of each application, or for each team, to allow the users to access specific components and use tags to improve the components indentification. Management zones do not provide a totally separated tenant, so, you cannot grant some specific permissions on the management zones, that is why is so important to have a good plan to implement the management zones.
Tags — Is an important Dynatrace component used to identify and classify everything that is being monitored by Dynatrace, allowing the RBAC to provide a fine grained access control.
Host Groups — Is a Dynatrace component that can group multiple hosts to be used to set the members of the management zones.
With these 4 components explained you need to decide how to implement them, and I will explain 2 different scenarios:
2.4.1 — Scenario 1 — Centralized monitoring team that will manage the whole environment and multiple squads that can manage and visualize some monitoring details.
For this kind of scenario, the best idea is to have a single ‘Environment (Tenant)’ for the whole organization and use Management zones and tags to organize the IT components and provide the right access to the right teams. The management zone layer have some administrative tasks limitation, for example the Management Zone team cannot configure integrations, extensions/plugins, Kubernetes API and they need to have the ‘Environment’ administrators to do it.
2.4.2 — Scenario 2 — Customer with self-sufficient teams that will cover end end support from the infrastructure (including monitoring) through the application.
For this kind of scenario you need to configure multiple ‘Environments’, and for each ‘Environment’ you can have independent administrators for each tenant using a single Dynatrace cluster. Having this multi-tenant scenario, you need to keep in mind, that you will not have the end to end tracing if you have components spread by different ‘Environment’, for example, a shared DB that is on an ‘Environment’ will not have this end visibility from applications being monitored by the other ‘Environment’.
One important factor on this decision, is also the number of ActiveGates you need, because the ActiveGate Plugins and ActiveGate Zremote is limited to work into the ‘Environments’, so, more Environments, more ActiveGates.
We have been using Dynatrace for over a year in our CIO Organization. It has proven effective in providing insight and eliminating blind spots in our varied use cases. We have been impressed with its reliability, ease of use, and visual dash-boarding capabilities. And we have been pleasantly surprised with its Service Mapping functionality, that allows us to keep a more current CMDB, a use case that we didn’t consider when procuring our first set of licenses. Our journey is still expanding, and I hope these tips above will be useful in yours.