Design Azure BC and DR capabilities

When planning a BC / DR strategy it is important to understand:

  • RPO (Recovery Point Objective) i.e. the maximum time in minutes for which data loss is acceptable when recovering from a disaster
  • RTO (Recovery Time Objective) i.e. the maximum time in minutes it takes to recover service in the event of a disaster
  • Synchronous vs. Asynchronous replication i.e. whether operations are carried out at the same time or queued
  • SLA (Service Level Agreement) for the underlying services

When designing high availability into Azure services consider:

  • Use of Availability Sets and load balancing for Virtual Machines
  • SQL Server AlwaysOn (=> 3 node WSFC – Primary Replica, Secondary Replica, FSW)
  • SQL Mirroring

Hyper-V Replica provides asynchronous replication of VMs without a shared storage requirement however shared storage can be leveraged with supported SANs for additional functionality. Azure Site Recovery also works wtih Hyper-V Replica.

System Center can provide orchestration for Site Recovery failovers.

Deploy websites

Several methods exist for deploying Azure websites: 

Azure site extensions can be deployed to add custom administrator functionality to your website.Site Control Manager can be setup to manage sites and extensions.

Web deployment packages can be created in Visual Studio and allow quick deployment of websites:

  • .zip file contains all files for deployment including:
    • .cmd file to customise IIS installation
    • .xml files to specify site parameters

Azure App Service Plan provides a mechanism to group web apps and other app services together to manage and scale. Deployment slots allow code to be staged and easily promoted from development to production and backed out. Database connection strings can be sticky per slot to ensure when new code is promoted into production it will use the production database and vice versa.

Web Apps can use a local GIT repository on Azure or an existing source control system. Dropbox can also be used with Azure as an external deployment source with the rollback feature enabled to revert to previous code versions.



Design web apps for scalability and performance

Websites can be scaled globally by serving content to clients using the CDN service and Azure Traffic Manager for performance load balancing.

Create website using Visual Studio and Azure SDK. Publish web applications using:

  • Azure PowerShell script
  • Publish from Visual Studio using Web Deploy
  • Publish using FTP

Debug published websites using:

  • Monitoring metrics through Azure portal
  • Azure Application Insights
  • Visual Studio for remote debugging
  • Site Control Manager (Project Kudu)

Azure provides support for developing applications and websites in a number of languages:

  • .NET (C#, Visual Basic)
  • Java (tomcat or jetty)
  • Node.js (server-side version of JavaScript)
  • PHP
  • Python

It is possible to run web applications on Virtual Machines, Cloud Services or Web Apps. Each provides benefits, a VM will allow greater flexibility in providing full control of the Operating System and installed applications while a Web App can be a more cost effective option with less management overhead and is easier to scale.



Integrate Azure services in a solution

There are a significant number of Azure services available to design your application. Each service offers its own SLAs.

With the growth of semi structured and unstructured data sets alternatives to relational databases such as SQL have become important. Some examples of requirements and Azure services which can be deployed to address them are as follows:

  • Search and query options – DocumentDB, Azure Search
  • Caching – Azure Redis Cache, CDN
  • Recommendation – Azure Search, Azure Machine Learning

Analysing large sets of data in motion has led to the release of many Open Source solutions including Hadoop, Kafka and Storm. Azure offers a range of Big Data and IoT services.

Components of a Big Data solution are likely to include:



Enterprise mobile applications address the growth of mobile and BYOD. Ingredients of a mobile application are likely to include:

  • Authentication & Identity (Azure AD & Azure AD Authentication Library (ADAL))
  • Access to on-premise data & services (Azure Application Proxy, Service Bus Relay or Azure App Service BizTalk API Apps Hybrid Connections)
  • Push notifications (Notification Hub)
  • Azure Mobile Services
  • Security & Compliance (Workplace Join, Azure Rights Management, Key Vault)



Related services can be managed using Azure resource groups defining a lifecycle boundary in a JSON-based template and applying RBAC policies to secure said resources.

Select the appropriate storage option

Read / write patterns often drive complexity in balancing consistency with performance.

  • Immediate consistency or strong consistency uses locks to ensure all observers will see (or not see) updates however this has a demand on process cycles.
  • Eventual consistency allows more immediate access to data however the results may not be current. Read heavy applications may favour this method for performance.
  • Optimistic concurrency allows data writes without locks, the revision number pre-write is stored and when data is written if the revision number has changed the write operation will fail.
  • Pessimistic concurrency is better suited to write-heavy applications as performance is likely to be better as a result of avoiding so many failed write operations.
  • Last-write wins is a simpler method whereby as the name implies the last write operation will be successful.
  • Sequential access efficiently reads data in a continuous range.
  • Random access directly addresses data points by path or a hash code of the data.

Data queries may be simple (using a key to address data) or more complex involving correlations, filters, etc.

  • Static schema relies on an explicit or fixed schema so all parties know how data is structured. A static schema supports complex queries and automation well. When querying large sets of data indexing can help however this can drive down performance.
  • Dynamic schema also known as schema-less or NoSQL databases do not have a fixed schema instead saving data as key-value pairs. A dynamic schema can support greater flexibility by enabling fields to be added, changed and removed without worrying about schema mismatches. However dynamic schemas can struggle with more complex queries.

Repetitive queries can be addressed by using a data cache such as Azure Redis Cache with a Time-To-Live (TTL) specified to quickly return results from memory.

When selecting an appropriate data storage solution consider:

  • Combination of data stores – SQL may be best suited for transactional data, Blob storage for large binary files, DocumentDB for loosely structured data and Azure Search for indexing free-text files
  • Keep data close to compute
  • Cost drivers – performance vs. cost (hot vs. cold, standard vs. premium, etc.)

When evaluating data storage qualities consider:

  • Reliability – LRS (3 local copies), GRS (additional 3 copies in a separate region), Azure SQL uses multiple active secondaries, consider reliability in your own solution
  • Scalability – Data Sharding is a common practice for scaling data stores and providing multi-tenancy. Azure SQL Datbase Elastic Scale supports data sharding.


Create long-running applications

In order to create long-running applications both reliability and availability must be considered. Simply moving an application to the cloud does not guarantee the service levels required. Cloud native applications, i.e. those developed to run in Azure are inherently more suited to the cloud. SLAs associated with Azure services should be understood when designing a service.

Multi-instancing is often a requirement to achieve the desired availability. In order to support multi-instancing two approaches are homogeneous instances and primary-secondary instances.

Considerations when designing for system-level availability:

  • Avoiding Single Points of Failure (SPOF)
  • Alternative Services
  • Cross-Region Deployments

Considerations when designing for system reliability:

  • Distribute service across Azure Update Domains and Fault Domains
  • Handle transient errors with retries (Azure SDK supports error handling)
  • Loose coupling of service components avoids system wide failures due to a single component failure, architectures which address this include SOA, Microservices and Message-Based
  • Health Monitoring (with Azure Application Insights or 3rd party solutions including New Relic and AppDynamics)

Considerations when designing for scale:

  • Scale up (limited) vs. Scale out
  • Scheduled scaling – for predictable workload changes
  • Reactive scaling – for unpredictable or unexpected workload changes
  • Container technologies can support sub-second deployment of instances
  • Workload partitioning is an alternative to load balancing for distribution of workloads based on dynamic or static partitioning (where distribution is predetermined)

The Azure Autoscale service supports scheduled and reactive scaling of VMs, Cloud Services and Web Apps.

Cloud Services is an Azure PaaS service for building scalable n-Tiered applications. Cloud Services are made up of one or more roles of the following types:

  • Worker Role – A long running process, e.g. listen to a job queue the Web Role sends requests to
  • Web Role – ASP.NET project which delivers the application presentation layer

Cloud Services expose services via Endpoints:

  • Input Endpoint – External access to the service through the Azure balancer
  • Internal Endpoint – Communication between role instances
  • Instance Input Endpoint – Access specific instances on a different port

Create compute-intensive applications

Big Compute enables a lot of computational tasks to be run in a distributed yet coordinated fashion. Use Cases include:

  • Media transcoding
  • Image analysis & processing
  • Engineering stress analysis
  • Rendering
  • Test Execution

Azure A8, A9, A10 and A11 VMs are tailored specifically for HPC with high CPU and fast 10 Gbps network connections..A8 and A9 VMs have an additional 32Gbps RDMA capable backend (Mellanox QDR InfiniBand) for instance communication. Azure also supports the Intel MPI Library which boosts performance for workloads running on the Intel architecture.

An HPC cluster is made up of a head node (cluster management) and a number of compute nodes. Microsoft HPC Pack can be used to create, manage and run HPC applications on Azure. A script based deployment approach is highly recommended for configuration of HPC clusters and compute nodes.


Schedule and manage parallel workloads at scale with Azure Batch. With Batch Apps API the Azure service will handle task scheduling, execution, partitioning, etc. for you whereas with the lower level Batch API you are responsible.

To work with Batch the following are required:

  • Batch account (and associated security key) for service request authentication
  • Task Virtual Machine (TVM) for running tasks
  • Work items describe how an application runs on a TVM pool
  • Jobs are scheduled work items which contain a number of Tasks
  • Input data for processing Tasks is upload to Azure storage as a File

A feature of Azure Batch is Azure Batch Apps allows you to manage, run and monitor batch jobs. A repeatable job is known as a Batch Application which is created by submitting two packages to Batch:

  • Application Image – a zip file that contains application executables and support files
  • Cloud Assembly – a zip file containing methods to break job into tasks and invoke application executables


Competing Consumers is a design pattern where task creators generate tasks to a common task queue.

  • Scaling out as needed – add more tasks processors at any time
  • Failover – tasks are locked while in progress however locks are released on failure
  • Hybrid Compute – run tasks processors on premise and in the cloud
  • Dynamic Load Balancing – to allocate tasks according to load