zero Copy Clone
Zero Copy Cloning
Before we jump in to zero copy Cloning, we need to understand a bit about partitions and micro partitions in Snowflake.
Partitions are not new in Snowflake alone. It is available even in traditional data warehouses as well.
What is Partitions :
Snowflake is columnize format. means it clusters data by columns
So when you select any columns, it will read and pull only required columns and it will not even touch other columns. It is called pruning. Since it is not reading / scanning complete rows, it is faster. But we can't see how data is getting partitioned internally. We will have millions of partitions created internally.
How the data is getting retrieved?
when we select a required columns and conditions in where class, snowflake just go that micro partitions using the snowflake metadata. So the pruning is very faster and reads only required files. This makes the micro-partitions very faster.
we call micro-partition as storage unit. Each storage unit size is from 50 MB to 500 MB of uncompressed data.
All micro-Partitions are immutable. Means they do not change.
If there is any change in data it creates new Micro Partitions and it will mark old data as inactive and put time travel info to it.
Once it passes through the time travel (data_retention_time_in_day) the Micro Partitions will be purged and it will not longer is be available.
In Snowflake, the following objects can be cloned:
- Databases
- schemas
- Tables
- Streams
Data Configuration and Transformation Objects
- Stages (external only - not internal)
- File Formats
- Sequences
- Tasks
The following account level objects cannot be cloned:
- Users
- Roles
- Grants
- Virtual Warehouses
- Resource monitors
- Storage integrations
/***************** Query to clone table **********************/
CREATE TABLE <New_clone_table> CLONE <Origianl_tableName>;
create table emp_clone clone TRAINING.DEMO.EMP;
/* Same as above we can clone we can clone databases, Schemas, tables etc .. */
once the clone happens the data will not be copied, Instead new tables / object will be created pointing to the the same micropatterns. That is why we call it zero copy cloning.
It is the snap shot of the object at the time of it created.
Once it is created, if any changes happens to the source, the cloned object will not get reflected.
But the cloned object will become the owner to the microoperation and storage cost will be on cloned object owner.
As long as there is no change to the original object , or cloned object, the storage cost is with original object.
This way even if you drop the original object / table, still cloned object will not get effected and the micro partitions will be owned by the cloned object.
We can apply time travel on cloned tables as well if you do any changes .
update CUSTOMERS_CLONE
set cust_key = 00000
where cust_key = 138788;
select * from CUSTOMERS_CLONE before (statement => '01ac6ab1-0004-8719-0062-a6070005a10a'); -- 138688 | Elizabeth Martinez
Query to retrieve list of cloned tables along with original table in table id :
where TABLE_NAME like 'emp' and clone_group_id in (
select clone_group_id from (
select clone_group_id, count(*) as Count_of_repeted_ids from SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS group by clone_group_id
having Count_of_repeted_ids >1
)
) order by original_table_id, originall_CLONE_GROUP_ID;
Comments
Post a Comment