snowflake-web-2025

Snowflake basic administration and management course notes.

Introduction to Snowflake. Notes from the basic Snowflake administration and management course by Marcelo Horacio Fortino. Version 2.0.3. September 2025.

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/4.0/. Permissions beyond those granted under this licence can be found at https://fortinux.com/en. Suggestions and comments to info@fortinux.com/en

Snowflake Inc. All rights reserved. Snowflake, its logo and all other Snowflake product, feature and service names mentioned herein are either registered trademarks or trademarks of Snowflake Inc. in the United States and other countries. All trademarks are the property of their respective owners. Apache Hadoop, Hadoop, Apache, the Apache feather logo, and the Apache Hadoop project logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and other countries. Copyright © 2006-2025 The Apache Software Foundation.

Version Author Date Comments
1.0 Marcelo Horacio Fortino 2020/Nov Fundamentals of Big Data
1.1 Marcelo Horacio Fortino 2022/Feb Translated into English, updated and converted to markdown - ipynb
2.0 Marcelo Horacio Fortino 2023/Oct Updated contents

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, including without warranty of MERCHANTABILITY implied or without warranty of FITNESS FOR A PARTICULAR PURPOSE. The author assumes no responsibility for any misuse by the reader.

  • These notes are based on:
    • The bibliography presented at the end of this document, and.
    • Own documentation collected over the years from various sources.

Course objectives

  • Introduce the basic use and management of Snowflake as a cloud data storage platform.
  • Understand the fundamentals of the Snowflake architecture.
  • Learn basic operations such as data loading, queries and user administration.

Agenda

  • Big Data: Definition and characteristics: Big Data and data analytics, Definition, Market, Big Data in Spain, Trends, 2021 Machine Learning, AI and Data (MAD) Landscape, MAD 2024, Data governance, Apache Atlas, ISO/IEC 20547 series of standards, NIST Big Data Interoperability Framework, Data quality management, Benefits, Paradigm, Characteristics: The Vs of Big Data, Types of data, History, Data warehouses / Data lakes / Lakehouses, Apache Hadoop and Apache Spark, Use cases and Best practices in Big Data, Types of data analytics in Big Data: descriptive, predictive and prescriptive, Big Data analytics and Business Intelligence, Free data sources, ASF: Tools for Big Data, Apache Hadoop, Apache Hadoop Modules.

  • Introduction to Snowflake, What is Snowflake and what is Snowflake used for, Snowflake AI Data Cloud, Snowflake architecture, Benefits of using Snowflake, Snowflake navigation and environment, Introduction to SnowSQL, Loading and managing data, Basic data loading methods (CSV files, JSON), Creating tables and schemas, Basic SQL queries, Managing users and roles, Permission-based access control, Managing storage and costs, Configuring virtual warehouses, Business continuity and data recovery, Snowpark, Streamlit, Tips for optimising costs and performance.

Introduction to Snowflake

What is Snowflake and what is it used for?

  • According to Metha(2024)[1(p.9)]:

    A cloud data platform provides a unified, secure and fully governed cloud environment for data warehousing, integration analysis and other computing tasks. It is the best choice for application architecture, simplifying development and operations (DevOps) and bringing innovative applications to market faster and more efficiently.

  • The same author envisions the ideal cloud data platform with the following features:

    • Almost unlimited resource capacity.
    • Governance and secure collaboration.
    • Fully managed solution.
    • Integrated distribution capabilities.
    • Flexibility of development models.
  • Snowflake's unified AI and data cloud platform(AI Data Cloud) connects ecosystems of organisations that create, use and share data, applications and AI.

  • The AI Data Cloud allows you to eliminate silos within organisations and collaborate with business partners and vendors, while integrating external data and applications for up-to-date information.

  • It can connect businesses of any size and together with Snowflake Marketplace simplifies the sharing, collaboration and monetisation of thousands of datasets, services and complete data applications.

  • Source: https://www.snowflake.com/en/why-snowflake/what-is-data-cloud/.

  • In this video the creators of Snowflake tell the story of how the company came to be:

  • .

Snowflake AI Data Cloud

Snowflake Architecture

  • Snowflake's cloud architecture enables flexibility, speed and scalability.

  • It is designed to manage and analyse large volumes of data without slowing performance.

  • The separation of its three main functions: storage, compute, and services; allows you to adjust resources independently of workflows and data growth.

  • Snowflake takes care of all system maintenance, such as software updates, server administration and security.

  • The Snowflake architecture consists of five layers:

  • Storage: Where all data is stored in columnar and compressed format, optimised for the cloud.

  • Interoperable storage:

    • Access to structured, semi-structured and unstructured data for easy processing, without silos or delays.
    • Automations and optimisations feature encryption by default, integrated storage compression and provide fast access to even petabytes of data.
    • Various architectural patterns can be implemented, such as data lakes, data warehouses, data lakehouses or data meshes.
    • Support for on-premises data and open table formats such as Apache Iceberg.
  • Virtual Warehouses: Clusters of computational resources that perform processing tasks. They can be resized on an as-needed basis.

  • Elastic Compute:

    • The scalable and flexible performance of multiple workloads allows you to separate compute from storage, scale up or down instantly, and pay only for the resources you use.
    • The engine supports a variety of programming languages, including Python, SQL, Java and Scala, and supports workload isolation and high levels of concurrency.
    • Its design promotes speed and offers performance, security and flexibility to work with your favourite libraries and development interfaces.
  • Cortex AI: Provides instant access to advanced AI tools, including LLMs, directly within the Snowflake AI Data Cloud.

  • Cortex AI:

    • Take advantage of fully managed models, such as Snowflake Arctic and Mistral, for word processing, sentiment analysis, custom summaries and more.
    • Access to various development interfaces with code, also those using SQL and Python, and Snowflake Studio for developing without code.
    • Cortex AI has also enabled the creation of a suite of generative AI products including Snowflake Copilot - an AI-based assistant that helps complete tasks in Snowflake - and Document AI, a family of services that uses machine learning (ML) to understand data and extract it from different types of documents.
  • Services: Metadata management, query optimisation, security, statistics collection and maintenance, and other operational services.

  • Cloud services:

    • Near-zero maintenance: no upgrades, no downtime.
    • Integrated governance through the Snowflake Horizon catalogue provides compliance, security, privacy, discovery and collaboration capabilities without the need for additional configurations or protocols.
    • You can take advantage of transparent performance improvements that are usually applied automatically with each new release.
    • Consumption-based pricing model with an integrated cost management interface, providing net usage data and tagging capabilities to improve visibility and spend control.
  • Snowgrid: A system that connects regions and cloud platforms (AWS, Azure, GCP) for collaboration, governance, and global data replication.

  • Snowgrid:

    • A technology layer that interconnects enterprise ecosystems across regions and clouds.
    • It facilitates cloud-to-cloud collaboration to quickly discover, share, access and share content between clouds and regions without relying on extract, transform, load (ETL) processes.
    • It also supports cross-cloud governance to associate flexible policies, tags and lineage to data for consistent application across users, workloads, clouds and regions.
    • Moreover, it enables business continuity between clouds for unmatched resiliency that avoids disruptions, complies with regulations even as they change, or migrates between clouds.
  • Sources:

  • https://www.snowflake.com/es/resources.

  • https://www.snowflake.com/es/resources/solution-brief/snowflakes-data-cloud/.

Benefits of using Snowflake

  • Support for all data types: JSON, XML, Apache Avro, Apache ORC, Apache Parquet, etc.

  • Automatic encryption: Data is automatically encrypted at rest and in transit.

  • Automatic partitioning: Snowflake's micro-partition architecture https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html.

  • Automatic clustering: Manages all table regrouping as needed.

  • Automatic compression: Using state-of-the-art compression algorithms.

  • Granular access control: Provides role-based access control on table objects, which automatically translate into privileges on underlying micro-partitions.

  • Automatic versioning, Time Travel, and Fail-Safe, Automatic versioning, Time Travel, and Fail-Safe.

  • Cloud-to-Cloud Replication: Disaster recovery with replication, failover and client redirection capabilities across all regions and cloud platforms.

  • Source: https://www.snowflake.com/en/blog/snowflake-managed-data-lake-benefits/.

Practice 1 - Use case

Practice 1b - Snowflake Navigation and Environment

Introduction to SnowSQL

  • SnowSQL is the command line interface (CLI) tool that allows you to interact with Snowflake.

  • You can control all aspects of the Snowflake data cloud, including loading, querying, changing and deleting data.

  • The Snowflake CLI client instead focuses primarily on the management of workloads and applications that connect to Snowflake.

  • Snowflake CLI allows you to run and debug Snowflake applications locally with the following benefits:

  • You can search, build and upload python packages that are not in Anaconda.

  • Supports UDF, stored procedures, data warehouses, and Streamlit applications in python.

  • Allows you to define packages in the requirements.txt file to run on deployment.

  • When upgrading existing applications, code and necessary dependencies are automatically modified.

  • Artifact deployment is automatically managed and uploaded to the Snowflake stages.

  • To install them:

  • https://docs.snowflake.com/en/user-guide/snowsql-install-config.

  • https://docs.snowflake.com/en/developer-guide/snowflake-cli/installation/installation.

  • To log in:

snowsql -a <nombre-cuenta> -u <nombre-usuario>
  • To create a database and its schema:
create or replace database sf_tuts;
  • To verify the database used:
select current_database(),
current_schema();
  • To create a table:
create or replace table emp_basic (
  first_name string ,
  last_name string ,
  email string ,
  streetaddress string ,
  city string ,
  start_date date
  );
  • To create a data warehouse:
create or replace warehouse sf_tuts_wh with
  warehouse_size='X-SMALL'
  auto_suspend = 180
  auto_resume = true
  initially_suspended=true;
put file:///tmp/employees0*.csv @sf_tuts.public.%emp_basic;    # GNU/Linux
put file://c:\temp\employees0*.csv @sf_tuts.public.%emp_basic; # Windows
  • To list uploaded data:
list @sf_tuts.public.%emp_basic;
  • The next step is to copy them into the table:
copy into emp_basic
  from @%emp_basic
  file_format = (type = csv field_optionally_enclosed_by='"')
  pattern = '.*employees0[1-5].csv.gz'
  on_error = 'skip_file';
  • As in SQL language, to make a query of all data in a table is executed:
select * from emp_basic;
  • To include a condition in the query:
select * from emp_basic where first_name = 'Ron';
  • The LIKE function supports the special characters % and _.
select email from emp_basic where email like '%.au';
  • The INSERT function updates the data in a table by adding new data to the table.
insert into emp_basic values
  ('Clementine','Adamou','cadamou@sf_tuts.com','10510 Sachs Road','Klenak','2017-9-22') ,
  ('Marlowe','De Anesy','madamouc@sf_tuts.co.uk','36768 Northfield Plaza','Fangshan','2017-1-26');
  • DROP is used to delete the table and data store:
drop database if exists sf_tuts;
drop warehouse if exists sf_tuts_wh;
  • To exit the session, use !exit or !disconnect:
!exit

Practice 2 - Using SnowSQL

User management

  • Snowflake administrators can create and manage users in a variety of ways:

    • Using SQL: all tasks including changinglogin credentials.
    • Snowsight: The web interface supports most tasks. https://app.snowflake.com/.
    • Classic Console: Does not support all tasks but provides a wizard to create users and perform the most common activities, such as resetting a user key.
  • Some user objects correspond to human users, while others correspond to a service or application interacting with Snowflake.

  • When creating a user object, you specify the user type to differentiate between humans and services.

  • This is important because human users must log into multi-factor authentication (MFA), as opposed to services and applications.

  • The TYPE property of a user object determines the type of user.

  • Values can be:

    • person, null, service, snowflake_service, and legacy_service.
  • Source: https://docs.snowflake.com/en/user-guide/admin-user-management.

Role Management

  • Snowflake uses roles to control the objects(virtual warehouses, databases, tables, etc.) that users can access:

    • Provides a set of predefined roles, along with a framework for defining a hierarchy of custom roles.
    • Automatically assigns all users the predefined PUBLIC role that allows access to Snowflake and basic objects.
    • In addition to the PUBLIC role, each user can have other roles, one of those being thedefault.
  • Snowflake recommends strict control of ACCOUNTADMIN, although at least two users should be assigned.

  • On the other hand, ACCOUNTADMIN should never be the default role for a user, instead a custom role or a role with basic admin permissions should be assigned as the default.

  • More information at https://docs.snowflake.com/en/user-guide/security-access-control-considerations.

  • To create users you must have the USERADMIN role. For example using SQL via CREATE USER.

  • If you want to use a custom role, assign it the CREATE USER privilege.

  • To modify users the role must possess the OWNERSHIP permission. Using SQL: ALTER USER.

# Ejemplo SQL
CREATE USER juanpalomo PASSWORD = 'juan1234' DEFAULT_ROLE = mirol MUST_CHANGE_PASSWORD = TRUE;
GRANT ROLE mirol TO USER juanpalomo;
  • For information about one or more users use DESCRIBE USER or SHOW USERS:
DESC USER juanpalomo;
  • When a user is deleted, the user's folders, worksheets, and dashboards are no longer accessible unless they were previously shared.
  • To remove a user:
DROP USER juanpalomo;

Permission-based access control

Practice 3 - User management

Storage and cost management

  • Snowflake's storage layer manages all data storage tasks by ensuring that data is organised, compressed, and encrypted in an efficient manner.
  • This layer supports structured, semi-structured, and unstructured data, along with flexibility for various types of data formats.
  • Snowflake uses advanced compression algorithms to reduce storage costs and improve performance,
  • Therefore data is stored using the columnar format, which optimises query speed and efficiency.
  • All data in Snowflake is encrypted both in storage and in transit, ensuring a high level of security.

✅ A good practice is to compress data before storage: this reduces costs and increases performance.

Virtual warehouses configuration

  • The Snowflake processing layer is where all data is manipulated.
  • This layer contains virtualwarehouses, clusters of independent compute resources that can be scaled based on workload.
  • Virtual data warehouses can be resized, suspended or resumed to optimise costs and performance.
  • Each virtual data warehouse operates independently, ensuring that the performance of one does not affect others.

✅ A good practice is to regularly monitor the usage of each warehouse and adjust the configuration based on current needs.

Practice 4 - Creating a Warehouse

Tips for optimising costs and performance

Practice 5 - Getting Started with Cost and Performance Optimisation

Business Continuity and Data Recovery

  • Database and share replication is available for all accounts.

  • Replication of other account objects and failover/recovery require the Business Critical Edition (or higher) plan instead.

  • This feature allows you to replicate objects from a source account to one or more target accounts in the same organisation. The replicated objects in each target account are called secondary objects and are replicas of the primary objects in the source account.

  • Replication is compatible with different regions and cloud platforms (AWS, Azure, GCP).

  • Source: https://docs.snowflake.com/en/user-guide/account-replication-intro.

  • Resources:

  • How to Configure Failover & Replication Across Clouds

  • .

  • Replication and failover commands https://docs.snowflake.com/en/sql-reference/commands-replication.

Snowpark

Practice 6 - Getting started with tables in Apache Iceberg

streamlit

Practice 7 - Getting started with Snowpark for Python and Streamlit

Bibliography

  • [11] METHA, M. (2024). Building Applications with Snowpark. Snowflake Special Edition. John Wiley & Sons, Inc.
  • [12] BAUM, D. (2024). Generative AI and LLMs. Snowflake Special Edition. John Wiley & Sons, Inc.

Cover image: Photo of a clipping from the front page of the Snowflake website (2025/09).

If you are looking for a trainer to run this course or another training activity (webinar, workshops, bootcamps, etc.) in your organisation, you can find me through the contact page. Thank you very much.

If you liked the article you can help me by donating with cryptocurrencies. Thank you!!!