External Table

External Table Explained 

External table is Snowflake feature that allows you to query data stored in an external stage as if the data is inside the snowflake table. But the external stage is not part of Snowflake. So Snowflake does not sotre or manage the stage. 

In Hadoop there are two type of tables. 

1. Managed tables : data resides with in Hadoop storage. The data + metadata will store in hadoop storage)
                                   When you drop table we will loose data + Metadata
2. External tables : Data will be stored in Hadoop External table. Create table external table is the syntax in that case. When you drop a table, only metadata will be dropped but still the Data will be residing at source as it will be sitting outside of the HDFS .

In Snowflake it is the same concept where the data is sitting in some other storage location and we create table on top if which is called External table / Fedoraed Table.

What is the purpose of These External table ?

  • Data lies in different environment that is outside of Snowflake. and we will have quick access.
  • It can be in any other file format and we are comfortable in sql . 
Create External table :

CREATE OR REPLACE EXTERNAL TABLE my_ext_tbl
with location = @my_CSV_STAGE/
FILE_FORMAT = (TYPE = CSV SKIP_HEADER  = 1)
PATTERN = '*employee.*[.]csv';

limitations : 

  • No DML actions are allowed as the source of data is from external cloud storage.
  • Helps you to read data from out side of Snowflake
  • we can have joins ad we can create views on top of these external table.
  • When you have hues data and lot of partitions / files, we will have some latency as it has to fetch and read the files on demand.
  • In case of External table we use schema or we can avoid schema as well.
Steps for the End to end Process :

Step1 : Create S3 Bucket 
Step 2 : Create IAM Policy ( <IAM policy Name> )
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::<S3 Bucket Name>/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::<S3 Bucket Name>",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "*"
                    ]
                }
            }
        }
    ]
}

Step 3 : create an IAM Role  <Role Name>

IAM Roles in AWS:  <Role Name>)
Trusted Entity Type : AWS account (Since we are trying to get access to third party tool snowflake)
Require external ID : In the same page Select this option check box and give some random number for time being.
permissions Policies : select previously created one (  <IAM policy Name>)
Once Created copy arn : (<iam arn>) 
Role Name : Maintain consistency when you create a role so that it is easy to maintain ( <Role Name>)
Once crated copy the role ARN : <role arn>

Step 4 : Create a database or use existing one ( ex : training)
Step 5 : Create a Schema or use existing one (Ex : demo)
step 6 : Create table (ex: emp)
Step 7 : Create a table Format :

create file format MY_CSV_FORMAT
    TYPE = 'CSV' 
    COMPRESSION = 'AUTO' 
    FIELD_DELIMITER = ',' 
    RECORD_DELIMITER = '\n' 
    SKIP_HEADER = 1 
    FIELD_OPTIONALLY_ENCLOSED_BY = '"' 
    TRIM_SPACE = FALSE 
    ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE 
    ESCAPE = 'NONE' 
    ESCAPE_UNENCLOSED_FIELD = '\134';

Step 8 : Create Storage Integration Object 
It acts as a bridge between AWS and Snowflake

create storage integration customers_snowpipe_s3_integration
  type = external_stage  
  storage_provider = 'S3'
  enabled = true
  storage_aws_role_arn = '<AWS role arn>'  -- Copy the arn from Role
  storage_allowed_locations = ('<s3 bucket name>')  -- Provide bucket name created 
 ;

Step 9 : Update Trusted entities for IAM Role we created above

DESC integration customers_snowpipe_s3_integration;


Step 10 : Create stage object

create or replace STAGE emp_snowpipe_stage
STORAGE_INTEGRATION = customers_snowpipe_s3_integration 
file_format = MY_csv_format
URL = 's3://customers-snowpipe-integration-tiruven/';

Step 11 : Create External table :
CREATE OR REPLACE external TABLE my_emp_ext_tbl (
    emp_ID number as (value:c1::int),
    name varchar as (value:c2::varchar),
    job varchar as (value:c3::varchar),
    manager number as (value:c4::int),
    doj date as (value:c5::date),
    sal number as (value:c6::int),
    comm number as (value:c7::int),
    dept number as (value:c8::int)
with location = @emp_snowpipe_stage
auto_refresh = false
file_format = (format_name = my_csv_format);

select * from my_emp_ext_tbl;












Comments