working with stage / files
Working with files in Stage
Snowflake allows us to place files in any stage and query stage.
- Step 1 : Create a stage Object
- Step 2 : upload (put) file into the stage
- Step 3 : read file from Stage
- Step 4 : Copy data from stage to table (We are not discussing this now)
As I have already mentioned, stage environment is location where you can store your structured, semi-structured or unstructured files before loading into original tables.
Snow site (modern web UI) allows us to upload files to named internal stages from our local system.
upload file from Snowsite :
Data => Database => Schema => stage => +File (in top upper right corner click)
But these is some file size limitation to use this.
put files into Stage using CLI :
best and easy way to upload files to stage is snow SQL (CLI)
sample data emp.csv
put file://Downloads/dept.csv @training.demo.emp_stg;
It is applicable when you try to put, get, list, query files
list @%<table_name> -- table Stage
list @<Stage_name> -- for named stage
in this case one more folder will be created in emp_stg with name dept and upload the file to stage area.
list @training.demo.emp_stg -- this will show you all files in the stage location. if you want to see files in specific folder you need to mention that extensively.
list @training.demo.emp_stg/dept
Removing / deleting files from stage :
To remove files from any stage we need to use remove command. It works good either in snow SQL, Snow site or legacy webUI.
remove @training.demo.emp_stg/emp/dept.csv;
Querying files from stage :
we have flexibility to query files directly from stage on any file using $notaion. we can do that on both structed files like CSV or semi structured files like json.
select $1,$2,$3,$4,$5,$6,$7 from @emp_stg/emp.csv;
CREATE OR REPLACE STAGE my_stage
file_format = (type = 'CSV' FIELD_DELIMITER = '|' SKIP_HEADER = 1);TYPE = 'CSV'
COMPRESSION = 'AUTO'
FIELD_DELIMITER = '|'
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134';
Working with JSON file format :
(we discuss more about this in upcoming session)
When you try to query semi structed files you will see the data as variant type where it is kind of json format.
sample json file as below with name nested_json.json
For better understanding showing like this for you[ ---------------- Record 1 --------- { <-- record is coming as a dictionary "book_title":"Food", <-- Field name Book title dictionary element Key : value "year_published":2001, <-- year published dictionary element Key : value "authors": [ <-- Authors details are coming in nested list { <-- Author name is a dictionary with key : value "first_name":"Fiona", "middle_name":null, "last_name":"Macdonald" }, { "first_name":"Gian", "middle_name":"Paulo", "last_name":"Faleschini" } ] }, ---------------- Record 2 --------- { "book_title":"Food", "year_published":2006, "authors": [ { "first_name":"Laura", "middle_name":"K", "last_name":"Egendorf" } ] } ]
step 1 : put file in stage
put file://Downloads/nested_json.json @training.demo.emp_stg/json;step 2 : create a file format
CREATE OR REPLACE FILE FORMAT books_json_format
TYPE = 'json'
STRIP_OUTER_ARRAY = TRUE;
Step 3 : query the json file using file format and resolve the nested json structure
SELECT
parse_json($1):book_title,
parse_json($1):year_published,
parse_json($1):authors[0]:first_name::varchar(20) as Author1_First_Name,
parse_json($1):authors[0]:last_name::varchar(20) as Author1_Last_Name,
parse_json($1):authors[0]:middle_name::varchar(20) as Author1_Middle_Name,
parse_json($1):authors[1]:first_name::varchar(20) as Author2_First_Name,
parse_json($1):authors[1]:last_name::varchar(20) as Author2_Last_Name,
parse_json($1):authors[1]:middle_name::varchar(20) as Author2_Middle_Name
FROM
@emp_stg/json/nested_json.json.gz (file_format => books_json_format);
we can think of it as combination of list and dictionary combined togetherparse_json($1) --> json will be read as variant filed:book_title --> get the book_title detailsparse_json($1):authors[0] --> will give the authors details in variant format (Dictionary type)in that we are taking first element by specifying [0] (it is list notation)parse_json($1):authors[0]:first_name --> In the first element we are taking frist_Name usingdictionary notation.



Comments
Post a Comment