At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.
Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.
Storage layer – Snowflake organises its data within the relevant cloud storage provider in a compressed columnar format – PBs are not problem.
Snowflake tables use a concept called micro-partitioning. An old colleague of mine calls this the “magic sauce”. These micro-partitions are contiguous units of storage where each micro-partition contains between 50 MB and 500 MB of uncompressed data – these are immutable! The cloud services layer will hold metadata about the min / max and distinct values about all micro-partitions.
Compute – Warehouses – I will cover this in more detail but this is where you apply set of resources to execute your queries. Warehouses vary in size, can be multi-clustered and you can set certain parameters around its scaling and shutdown policies. From experience, I have used many warehouses with different query profiles hitting the same database with zero issues, this is a massive advantage of Snowflake.
Combined it looks like:
This is the foundation of snowflake and what makes it so powerful the decoupling of storage and compute is key! if you are coming from a Synapse background, the theory is same where decoupling of compute and storage is applied but how snowflake does it is completely different