Read / write patterns often drive complexity in balancing consistency with performance.
- Immediate consistency or strong consistency uses locks to ensure all observers will see (or not see) updates however this has a demand on process cycles.
- Eventual consistency allows more immediate access to data however the results may not be current. Read heavy applications may favour this method for performance.
- Optimistic concurrency allows data writes without locks, the revision number pre-write is stored and when data is written if the revision number has changed the write operation will fail.
- Pessimistic concurrency is better suited to write-heavy applications as performance is likely to be better as a result of avoiding so many failed write operations.
- Last-write wins is a simpler method whereby as the name implies the last write operation will be successful.
- Sequential access efficiently reads data in a continuous range.
- Random access directly addresses data points by path or a hash code of the data.
Data queries may be simple (using a key to address data) or more complex involving correlations, filters, etc.
- Static schema relies on an explicit or fixed schema so all parties know how data is structured. A static schema supports complex queries and automation well. When querying large sets of data indexing can help however this can drive down performance.
- Dynamic schema also known as schema-less or NoSQL databases do not have a fixed schema instead saving data as key-value pairs. A dynamic schema can support greater flexibility by enabling fields to be added, changed and removed without worrying about schema mismatches. However dynamic schemas can struggle with more complex queries.
Repetitive queries can be addressed by using a data cache such as Azure Redis Cache with a Time-To-Live (TTL) specified to quickly return results from memory.
When selecting an appropriate data storage solution consider:
- Combination of data stores – SQL may be best suited for transactional data, Blob storage for large binary files, DocumentDB for loosely structured data and Azure Search for indexing free-text files
- Keep data close to compute
- Cost drivers – performance vs. cost (hot vs. cold, standard vs. premium, etc.)
When evaluating data storage qualities consider:
- Reliability – LRS (3 local copies), GRS (additional 3 copies in a separate region), Azure SQL uses multiple active secondaries, consider reliability in your own solution
- Scalability – Data Sharding is a common practice for scaling data stores and providing multi-tenancy. Azure SQL Datbase Elastic Scale supports data sharding.