Snowflake Tables

 A table in Snowflake is a database object. we can see the list of tables under database and Schema:

Show all tables in snowflake account.

show tables:



We have 4 Types of tables in Snowflake:

  • Permanent Table
  • Transient Table
  • Temporary Table
  • External Table

1. permanent Table : 

    These are standard, regular database tables. These are default tables and no need of any additional syntax to create them. Data stored in permanent tables consume space and Snowflake bills for it. These permanent tables comes with additional features like time travel and file safe.

create table <table name> (fileld datatype, filed2 datatype, field3 datatype);

Sample Code :
create or replace table customer_table (
  id integer,
  last_name string,
  first_name string,
  company string,
  email string,
  workphone string,
  cellphone string,
  streetaddress string,
  city string,
  postalcode string);

2. Transient Table : 

    The transient tables will exist till you explicitly drop them and they will be available to other users based on the privileges you provide them. They will have lower level of data protection than permanent tables. They will not have fail safe. They have very limited time travel period. They are best suited where the data is not critical and when we have option to recover them from external sources..

create transient table <table name> (fileld datatype, filed2 datatype, field3 datatype);

create or replace transient table customer_table (
  id integer,
  last_name string,
  first_name string,
  company string,
  email string,
  workphone string,
  cellphone string,
  streetaddress string,
  city string,
  postalcode string);

3. Temporary Table :

    These are specific to the session. These are stored in no-permanent, Transitory data. These tables are not visible to other sessions or users. once the session ends, the data will be purged completely and not recoverable. Though they will be dropped automatically, Snowflake recommends to drop them explicitly to prevent any unexpected storage charges. .

It is session specific. Time Travel is 24 Hrs. Data_retention_time_in_days =1 (if session is not closed). Fail_safe = 0 on session close it will drop automatically

create temporary table <table name> (fileld datatype, filed2 datatype, field3 datatype);

create or replace temporary table customer_table (
  id integer,
  last_name string,
  first_name string,
  company string,
  email string,
  workphone string,
  cellphone string,
  streetaddress string,
  city string,
  postalcode string);

4. External Table. (Reading data from external Sources) : 

An external table is a Snowflake feature that allows you to query data stored in an external stage as if the data were inside a table in Snowflake. The external stage is not part of Snowflake, So Snowflake does not store or manage the stage.

External Tables are database objects so we need to set context like role, warehouse, database and schema. Since we are not storing data in snowflake, we will not have any storage charges in this external tables.

  • Table store file level metadata like FileName, Version, Identifier and related properties.
  • These table are read-only tables and no DML operations are allowed.
  • can be used to query data , and join operations.
  • Views can be created on top of these tables.
  • Quiring data against External tables are slow when we compare them to Native tables.
  • Time travel is not supported
use cases :
  • When you want to do some adhoc analysis. 
  • when you don't want to read and load complete data into snowflake table.
  • When you want to do POCs.
  • when there is some static data that needs to be used by multiple people / teams and when it is not needed to be loaded in to tables.

Steps to create External table from WebUI :

Assume that we have configured the external stage already 

  • go to database {my_database} 
  • under Stages select Create stage 
  • This prompt us to select one among 4 options 
    • Snowflake Managed
    • Existing S3 Location
    • Existing Microsoft Azure Location
    • Existing Google Cloud Platform
  • Once you select it, it prompts you to provide external stage location details based on the provider you selected. (we need to provide a name to stage in this step)
  • Now go to worksheet and follow the steps as below

list @my_csv_stage/;  --  This will list all files that are available in my stages.

CREATE OR REPLACE EXTERNAL TABLE my_ext_table
  WITH LOCATION = @MY_CSV_STAGE/    --external stage location
  FILE_FORMAT = (TYPE = CSV  SKIP_HEADER = 1)  --File format to tell snow flake that it is csv
  PATTERN='.*employee.*[.]csv'; -- trying to read csv files that has employee in the file name

Note : When we create an external table it will create a VARIANT type fata means it will have only one field  named "VALUE" with all data for each row.

select * from my_ext_table;

{"c1":"100","c2":"Jennifer","c3":"4400","c4":"10","c5":"2017-01-05"}
{"c1":"101","c2":"Michael","c3":"13000","c4":"10","c5":"2018-08-24"}
{"c1":"102","c2":"Pat","c3":"6000","c4":"10","c5":"2018-12-10"}
{"c1":"103","c2":"Dan","c3":"11000","c4":"20","c5":"2019-02-17"}
{"c1":"104","c2":"Alexander","c3":"3100","c4":"20","c5":"2019-07-01"}

This out put is like simple json

Now we can query this using $notaion as below

SELECT
$1:c1,  $2:c2,  $3:c3, $4:c3, $5:c5
FROM my_ext_table;




It is hard to write this type of query each time when you want to read data. So it is always better to write create external table with schema so we can query the table as normal table.

CREATE OR REPLACE EXTERNAL TABLE my_aws_ext_table(
 EMPLOYEE_ID varchar AS (value:c1::varchar),   
 NAME varchar AS (value:c2::varchar),
 SALARY number AS (value:c3::number),
 DEPARTMENT_ID number AS (value:c4::number),
 JOINING_DATE date AS TO_DATE(value:c5::varchar,'YYYY-MM-DD')
)
WITH
LOCATION=@MY_CSV_STAGE/
PATTERN='.*employee.*[.]csv'
FILE_FORMAT = (TYPE = CSV  SKIP_HEADER = 1);


SELECT * FROM my_aws_ext_table:




---- Partitioning the data---- Upload the files in different department folders.
CREATE OR REPLACE EXTERNAL TABLE my_aws_ext_table(
 DEPARTMENT varchar AS split_part(metadata$filename,'/',1),
FILE_NM varchar AS split_part(metadata$filename,'/',2),
 EMPLOYEE_ID varchar AS (value:c1::varchar),
 NAME varchar AS (value:c2::varchar),
 SALARY number AS (value:c3::number),
 DEPARTMENT_ID number AS (value:c4::number),
 JOINING_DATE date AS TO_DATE(value:c5::varchar,'YYYY-MM-DD')
)
PARTITION BY (DEPARTMENT)
LOCATION=@MY_CSV_STAGE/
PATTERN='.*employee.*[.]csv'
FILE_FORMAT = (TYPE = CSV  SKIP_HEADER = 1) ;

in the above code observe the the lines as mentioned below

DEPARTMENT varchar AS split_part(metadata$filename,'/',1),  : This will create one filed with the folder name of the file to which the record belongs to.

FILE_NM varchar AS split_part(metadata$filename,'/',2), : This will create one filed with file name to which the record belongs to

PARTITION BY (DEPARTMENT) : Using this statement the process will be faster as snowflake will read only required folders as it will enable snowflake to read partitioned data. In case of CSV, parquet files it is common practice to store data in multiple folders based on date / department etc.. so we can avoid to read complete data suing this option.


Comments