Big Compute enables a lot of computational tasks to be run in a distributed yet coordinated fashion. Use Cases include:
- Media transcoding
- Image analysis & processing
- Engineering stress analysis
- Test Execution
Azure A8, A9, A10 and A11 VMs are tailored specifically for HPC with high CPU and fast 10 Gbps network connections..A8 and A9 VMs have an additional 32Gbps RDMA capable backend (Mellanox QDR InfiniBand) for instance communication. Azure also supports the Intel MPI Library which boosts performance for workloads running on the Intel architecture.
An HPC cluster is made up of a head node (cluster management) and a number of compute nodes. Microsoft HPC Pack can be used to create, manage and run HPC applications on Azure. A script based deployment approach is highly recommended for configuration of HPC clusters and compute nodes.
Schedule and manage parallel workloads at scale with Azure Batch. With Batch Apps API the Azure service will handle task scheduling, execution, partitioning, etc. for you whereas with the lower level Batch API you are responsible.
To work with Batch the following are required:
- Batch account (and associated security key) for service request authentication
- Task Virtual Machine (TVM) for running tasks
- Work items describe how an application runs on a TVM pool
- Jobs are scheduled work items which contain a number of Tasks
- Input data for processing Tasks is upload to Azure storage as a File
A feature of Azure Batch is Azure Batch Apps allows you to manage, run and monitor batch jobs. A repeatable job is known as a Batch Application which is created by submitting two packages to Batch:
- Application Image – a zip file that contains application executables and support files
- Cloud Assembly – a zip file containing methods to break job into tasks and invoke application executables
Competing Consumers is a design pattern where task creators generate tasks to a common task queue.
- Scaling out as needed – add more tasks processors at any time
- Failover – tasks are locked while in progress however locks are released on failure
- Hybrid Compute – run tasks processors on premise and in the cloud
- Dynamic Load Balancing – to allocate tasks according to load