Snowflake basic administration and management course notes.
Introduction to Snowflake. Notes from the basic Snowflake administration and management course by Marcelo Horacio Fortino. Version 2.0.3. September 2025.
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/4.0/. Permissions beyond those granted under this licence can be found at https://fortinux.com/en. Suggestions and comments to info@fortinux.com/en
Snowflake Inc. All rights reserved. Snowflake, its logo and all other Snowflake product, feature and service names mentioned herein are either registered trademarks or trademarks of Snowflake Inc. in the United States and other countries. All trademarks are the property of their respective owners. Apache Hadoop, Hadoop, Apache, the Apache feather logo, and the Apache Hadoop project logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and other countries. Copyright © 2006-2025 The Apache Software Foundation.
| Version | Author | Date | Comments |
|---|---|---|---|
| 1.0 | Marcelo Horacio Fortino | 2020/Nov | Fundamentals of Big Data |
| 1.1 | Marcelo Horacio Fortino | 2022/Feb | Translated into English, updated and converted to markdown - ipynb |
| 2.0 | Marcelo Horacio Fortino | 2023/Oct | Updated contents |
This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, including without warranty of MERCHANTABILITY implied or without warranty of FITNESS FOR A PARTICULAR PURPOSE. The author assumes no responsibility for any misuse by the reader.
- These notes are based on:
- The bibliography presented at the end of this document, and.
- Own documentation collected over the years from various sources.
Course objectives
- Introduce the basic use and management of Snowflake as a cloud data storage platform.
- Understand the fundamentals of the Snowflake architecture.
- Learn basic operations such as data loading, queries and user administration.
Agenda
-
Big Data: Definition and characteristics: Big Data and data analytics, Definition, Market, Big Data in Spain, Trends, 2021 Machine Learning, AI and Data (MAD) Landscape, MAD 2024, Data governance, Apache Atlas, ISO/IEC 20547 series of standards, NIST Big Data Interoperability Framework, Data quality management, Benefits, Paradigm, Characteristics: The Vs of Big Data, Types of data, History, Data warehouses / Data lakes / Lakehouses, Apache Hadoop and Apache Spark, Use cases and Best practices in Big Data, Types of data analytics in Big Data: descriptive, predictive and prescriptive, Big Data analytics and Business Intelligence, Free data sources, ASF: Tools for Big Data, Apache Hadoop, Apache Hadoop Modules.
-
Introduction to Snowflake, What is Snowflake and what is Snowflake used for, Snowflake AI Data Cloud, Snowflake architecture, Benefits of using Snowflake, Snowflake navigation and environment, Introduction to SnowSQL, Loading and managing data, Basic data loading methods (CSV files, JSON), Creating tables and schemas, Basic SQL queries, Managing users and roles, Permission-based access control, Managing storage and costs, Configuring virtual warehouses, Business continuity and data recovery, Snowpark, Streamlit, Tips for optimising costs and performance.
Introduction to Snowflake
What is Snowflake and what is it used for?
-
According to Metha(2024)[1(p.9)]:
A cloud data platform provides a unified, secure and fully governed cloud environment for data warehousing, integration analysis and other computing tasks. It is the best choice for application architecture, simplifying development and operations (DevOps) and bringing innovative applications to market faster and more efficiently.
-
The same author envisions the ideal cloud data platform with the following features:
- Almost unlimited resource capacity.
- Governance and secure collaboration.
- Fully managed solution.
- Integrated distribution capabilities.
- Flexibility of development models.
-
Snowflake's unified AI and data cloud platform(AI Data Cloud) connects ecosystems of organisations that create, use and share data, applications and AI.
-
The AI Data Cloud allows you to eliminate silos within organisations and collaborate with business partners and vendors, while integrating external data and applications for up-to-date information.
-
It can connect businesses of any size and together with Snowflake Marketplace simplifies the sharing, collaboration and monetisation of thousands of datasets, services and complete data applications.
-
Source: https://www.snowflake.com/en/why-snowflake/what-is-data-cloud/.
-
In this video the creators of Snowflake tell the story of how the company came to be:
-
.
Snowflake AI Data Cloud
-
Currently (September 2025) the most important Snowflake AI Data Cloud features are:
-
Snowpark: Create data pipelines with python, java, and scala.
-
streamlit: Build interactive applications and data dashboards.
-
https://www.snowflake.com/en/product/features/streamlit-in-snowflake/.
-
Cortex AI: Analyse data and build AI applications with LLM models and text-to-SQL services.
-
Copilot: AI assistant for SQL queries.
-
Document AI: Extracts information from PDF documents and images. Transforms unstructured data into structured data.
-
https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-ai/overview.
-
Cortex Analyst: Interact with data using the common language.
-
https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst.
-
Cortex Search: Find relevant information in unstructured data using natural language queries.
-
https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview.
Snowflake Architecture
-
Snowflake's cloud architecture enables flexibility, speed and scalability.
-
It is designed to manage and analyse large volumes of data without slowing performance.
-
The separation of its three main functions: storage, compute, and services; allows you to adjust resources independently of workflows and data growth.
-
Snowflake takes care of all system maintenance, such as software updates, server administration and security.
-
The Snowflake architecture consists of five layers:
-
Storage: Where all data is stored in columnar and compressed format, optimised for the cloud.
-
Interoperable storage:
- Access to structured, semi-structured and unstructured data for easy processing, without silos or delays.
- Automations and optimisations feature encryption by default, integrated storage compression and provide fast access to even petabytes of data.
- Various architectural patterns can be implemented, such as data lakes, data warehouses, data lakehouses or data meshes.
- Support for on-premises data and open table formats such as Apache Iceberg.
-
Virtual Warehouses: Clusters of computational resources that perform processing tasks. They can be resized on an as-needed basis.
-
Elastic Compute:
- The scalable and flexible performance of multiple workloads allows you to separate compute from storage, scale up or down instantly, and pay only for the resources you use.
- The engine supports a variety of programming languages, including Python, SQL, Java and Scala, and supports workload isolation and high levels of concurrency.
- Its design promotes speed and offers performance, security and flexibility to work with your favourite libraries and development interfaces.
-
Cortex AI: Provides instant access to advanced AI tools, including LLMs, directly within the Snowflake AI Data Cloud.
-
Cortex AI:
- Take advantage of fully managed models, such as Snowflake Arctic and Mistral, for word processing, sentiment analysis, custom summaries and more.
- Access to various development interfaces with code, also those using SQL and Python, and Snowflake Studio for developing without code.
- Cortex AI has also enabled the creation of a suite of generative AI products including Snowflake Copilot - an AI-based assistant that helps complete tasks in Snowflake - and Document AI, a family of services that uses machine learning (ML) to understand data and extract it from different types of documents.
-
Services: Metadata management, query optimisation, security, statistics collection and maintenance, and other operational services.
-
Cloud services:
- Near-zero maintenance: no upgrades, no downtime.
- Integrated governance through the Snowflake Horizon catalogue provides compliance, security, privacy, discovery and collaboration capabilities without the need for additional configurations or protocols.
- You can take advantage of transparent performance improvements that are usually applied automatically with each new release.
- Consumption-based pricing model with an integrated cost management interface, providing net usage data and tagging capabilities to improve visibility and spend control.
-
Snowgrid: A system that connects regions and cloud platforms (AWS, Azure, GCP) for collaboration, governance, and global data replication.
-
Snowgrid:
- A technology layer that interconnects enterprise ecosystems across regions and clouds.
- It facilitates cloud-to-cloud collaboration to quickly discover, share, access and share content between clouds and regions without relying on extract, transform, load (ETL) processes.
- It also supports cross-cloud governance to associate flexible policies, tags and lineage to data for consistent application across users, workloads, clouds and regions.
- Moreover, it enables business continuity between clouds for unmatched resiliency that avoids disruptions, complies with regulations even as they change, or migrates between clouds.
-
Sources:
-
https://www.snowflake.com/es/resources/solution-brief/snowflakes-data-cloud/.
Benefits of using Snowflake
-
Support for all data types: JSON, XML, Apache Avro, Apache ORC, Apache Parquet, etc.
-
Automatic encryption: Data is automatically encrypted at rest and in transit.
-
Automatic partitioning: Snowflake's micro-partition architecture https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html.
-
Automatic clustering: Manages all table regrouping as needed.
-
Automatic compression: Using state-of-the-art compression algorithms.
-
Granular access control: Provides role-based access control on table objects, which automatically translate into privileges on underlying micro-partitions.
-
Automatic versioning, Time Travel, and Fail-Safe, Automatic versioning, Time Travel, and Fail-Safe.
-
Cloud-to-Cloud Replication: Disaster recovery with replication, failover and client redirection capabilities across all regions and cloud platforms.
-
Source: https://www.snowflake.com/en/blog/snowflake-managed-data-lake-benefits/.
Practice 1 - Use case
Practice 1b - Snowflake Navigation and Environment
Introduction to SnowSQL
-
SnowSQL is the command line interface (CLI) tool that allows you to interact with Snowflake.
-
You can control all aspects of the Snowflake data cloud, including loading, querying, changing and deleting data.
-
The Snowflake CLI client instead focuses primarily on the management of workloads and applications that connect to Snowflake.
-
Snowflake CLI allows you to run and debug Snowflake applications locally with the following benefits:
-
You can search, build and upload python packages that are not in Anaconda.
-
Supports UDF, stored procedures, data warehouses, and Streamlit applications in python.
-
Allows you to define packages in the requirements.txt file to run on deployment.
-
When upgrading existing applications, code and necessary dependencies are automatically modified.
-
Artifact deployment is automatically managed and uploaded to the Snowflake stages.
-
To install them:
-
https://docs.snowflake.com/en/user-guide/snowsql-install-config.
-
https://docs.snowflake.com/en/developer-guide/snowflake-cli/installation/installation.
-
To log in:
snowsql -a <nombre-cuenta> -u <nombre-usuario>
- To create a database and its schema:
create or replace database sf_tuts;
- To verify the database used:
select current_database(),
current_schema();
- To create a table:
create or replace table emp_basic (
first_name string ,
last_name string ,
email string ,
streetaddress string ,
city string ,
start_date date
);
- To create a data warehouse:
create or replace warehouse sf_tuts_wh with
warehouse_size='X-SMALL'
auto_suspend = 180
auto_resume = true
initially_suspended=true;
- To upload sample data, download it from https://docs.snowflake.com/en/_downloads/34f4a66f56d00340f8f7a92acaccd977/getting-started.zip, and unzip it in the /temp directory on Linux OS, or in c:temp on Windows OS. Once decompressed, they are sent using the following command:
put file:///tmp/employees0*.csv @sf_tuts.public.%emp_basic; # GNU/Linux
put file://c:\temp\employees0*.csv @sf_tuts.public.%emp_basic; # Windows
- To list uploaded data:
list @sf_tuts.public.%emp_basic;
- The next step is to copy them into the table:
copy into emp_basic
from @%emp_basic
file_format = (type = csv field_optionally_enclosed_by='"')
pattern = '.*employees0[1-5].csv.gz'
on_error = 'skip_file';
- As in SQL language, to make a query of all data in a table is executed:
select * from emp_basic;
- To include a condition in the query:
select * from emp_basic where first_name = 'Ron';
- The LIKE function supports the special characters
%and_.
select email from emp_basic where email like '%.au';
- The INSERT function updates the data in a table by adding new data to the table.
insert into emp_basic values
('Clementine','Adamou','cadamou@sf_tuts.com','10510 Sachs Road','Klenak','2017-9-22') ,
('Marlowe','De Anesy','madamouc@sf_tuts.co.uk','36768 Northfield Plaza','Fangshan','2017-1-26');
- DROP is used to delete the table and data store:
drop database if exists sf_tuts;
drop warehouse if exists sf_tuts_wh;
- To exit the session, use !exit or !disconnect:
!exit
- Sources:
- https://quickstarts.snowflake.com/guide/getting_started_with_snowsql/index.html.
- Tutorial https://docs.snowflake.com/user-guide/tutorials/snowflake-in-20minutes#introduction.
- Getting Started - Introduction to Snowflake Data Loading
- .
- How To Install, Configure & Use SnowSQL | Snowflake 101
- .
Practice 2 - Using SnowSQL
User management
-
Snowflake administrators can create and manage users in a variety of ways:
- Using SQL: all tasks including changinglogin credentials.
- Snowsight: The web interface supports most tasks. https://app.snowflake.com/.
- Classic Console: Does not support all tasks but provides a wizard to create users and perform the most common activities, such as resetting a user key.
-
Some user objects correspond to human users, while others correspond to a service or application interacting with Snowflake.
-
When creating a user object, you specify the user type to differentiate between humans and services.
-
This is important because human users must log into multi-factor authentication (MFA), as opposed to services and applications.
-
The
TYPEproperty of a user object determines the type of user. -
Values can be:
- person, null, service, snowflake_service, and legacy_service.
-
Source: https://docs.snowflake.com/en/user-guide/admin-user-management.
Role Management
-
Snowflake uses roles to control the objects(virtual warehouses, databases, tables, etc.) that users can access:
- Provides a set of predefined roles, along with a framework for defining a hierarchy of custom roles.
- Automatically assigns all users the predefined PUBLIC role that allows access to Snowflake and basic objects.
- In addition to the PUBLIC role, each user can have other roles, one of those being thedefault.
-
Snowflake recommends strict control of ACCOUNTADMIN, although at least two users should be assigned.
-
On the other hand, ACCOUNTADMIN should never be the default role for a user, instead a custom role or a role with basic admin permissions should be assigned as the default.
-
More information at https://docs.snowflake.com/en/user-guide/security-access-control-considerations.
-
To create users you must have the USERADMIN role. For example using SQL via CREATE USER.
-
If you want to use a custom role, assign it the CREATE USER privilege.
-
To modify users the role must possess the OWNERSHIP permission. Using SQL: ALTER USER.
# Ejemplo SQL
CREATE USER juanpalomo PASSWORD = 'juan1234' DEFAULT_ROLE = mirol MUST_CHANGE_PASSWORD = TRUE;
GRANT ROLE mirol TO USER juanpalomo;
- For information about one or more users use DESCRIBE USER or SHOW USERS:
DESC USER juanpalomo;
- When a user is deleted, the user's folders, worksheets, and dashboards are no longer accessible unless they were previously shared.
- To remove a user:
DROP USER juanpalomo;
Permission-based access control
-
Snowflake combines aspects of the following access control models:
- Discretionary Access Control (DAC): Each object has an owner, who can in turn provide access to it.
- Role-based Access Control (RBAC): Access privileges are assigned to roles, which are assigned to users.
- User-based Access Control (UBAC): Access privileges are assigned to users but only when USE SECONDARY ROLE is set to ALL. https://docs.snowflake.com/sql-reference/sql/use-secondary-roles.
-
Some key concepts to understand how access control works in Snowflake are:
- Protectable object: Entity to which access can be granted. Unless authorised, access is denied.
- Role: Entity to which privileges can be granted.
- Privilege: Defined level of access to an object. Multiple different privileges can be used to control the granularity of access granted.
- User: User identity recognised by Snowflake, either associated with a person or a service. A user is also an entity that can be granted privileges.
-
Each protectable object resides in a logical container within a container hierarchy.
-
The top container is the customer organisation.
-
Protectable objects, such as tables, views, functions and stages, are contained in a schema object, which in turn is contained in a database.
-
All databases in your Snowflake account are located in the account object.
-
Privileges assigned to roles or users allow access to protectable objects.
-
Roles can be assigned to users or other roles.
-
Assigning a role to another role creates a role hierarchy https://docs.snowflake.com/user-guide/security-access-control-overview#label-role-hierarchy-and-privilege-inheritance.
-
RBAC is generally used to manage access to protectable objects in Snowflake.
-
Source: https://docs.snowflake.com/user-guide/security-access-control-overview.
-
Authentication policies ultimately provide control over how a client or user is authenticated.
-
https://docs.snowflake.com/en/user-guide/authentication-policies.
-
Resources:
-
https://airbyte.com/blog/best-practices-snowflake-users-roles-and-permissions/.
-
Open-source project for managing Snowflake permissions and roles through YAML configuration https://gitlab.com/gitlab-data/permifrost/.
Practice 3 - User management
Storage and cost management
- Snowflake's storage layer manages all data storage tasks by ensuring that data is organised, compressed, and encrypted in an efficient manner.
- This layer supports structured, semi-structured, and unstructured data, along with flexibility for various types of data formats.
- Snowflake uses advanced compression algorithms to reduce storage costs and improve performance,
- Therefore data is stored using the columnar format, which optimises query speed and efficiency.
- All data in Snowflake is encrypted both in storage and in transit, ensuring a high level of security.
✅ A good practice is to compress data before storage: this reduces costs and increases performance.
Virtual warehouses configuration
- The Snowflake processing layer is where all data is manipulated.
- This layer contains virtualwarehouses, clusters of independent compute resources that can be scaled based on workload.
- Virtual data warehouses can be resized, suspended or resumed to optimise costs and performance.
- Each virtual data warehouse operates independently, ensuring that the performance of one does not affect others.
✅ A good practice is to regularly monitor the usage of each warehouse and adjust the configuration based on current needs.
-
Source: Data Warehouse Architecture and Design: Best Practices.
-
Resources:
-
The Essential Guide to Cloud AI and Data Platforms https://www.snowflake.com/resource/the-essential-guide-to-cloud-ai-and-data-platforms/.
-
Snowflake Cost Optimisation Videos
-
.
Practice 4 - Creating a Warehouse
Tips for optimising costs and performance
-
It is important to proactively manage consumption costs to ensure you get maximum value from your Snowflake investment.
-
Consideration should be given to implementing a Financial Operations (FinOps) framework to collaboratively manage Snowflake costs across business functions, such as finance, technology, operations and procurement teams.
-
This framework should take into account strategy, people, processes and technology to optimise costs, improve predictability and better align costs with business value.
-
The list of suggested best practices is:
-
Enable automatic suspend and resume.
-
Configure resource monitors.
-
Set up an account level budget.
-
Start with a small data warehouse.
-
Control usage.
-
Use UI-based cost exploration.
-
Choosing the right data warehouse type and size.
-
Consider using Snowpark-optimised data stores for memory-intensive workloads.
-
Source:
-
The Definitive Guide to Managing Spend in Snowflake https://www.snowflake.com/en/resources/white-paper/definitive-guide-to-managing-spend-in-snowflake/.
-
Resources:
-
https://www.startdataengineering.com/post/optimize-snowflake-cost/.
-
https://select.dev/posts/snowflake-query-optimization#snowflake-query-optimization-techniques.
-
Snowflake Optimisation Methods: Best Practices to Maximise ROI
-. -
https://seemoredata.io/blog/performance-tuning-in-snowflake/.
-
Essential Strategies for Optimizing Snowflake Performance and Reducing Costs.
-
https://www.montecarlodata.com/blog-snowflake-cost-optimization/.
Practice 5 - Getting Started with Cost and Performance Optimisation
Business Continuity and Data Recovery
-
Database and share replication is available for all accounts.
-
Replication of other account objects and failover/recovery require the Business Critical Edition (or higher) plan instead.
-
This feature allows you to replicate objects from a source account to one or more target accounts in the same organisation. The replicated objects in each target account are called secondary objects and are replicas of the primary objects in the source account.
-
Replication is compatible with different regions and cloud platforms (AWS, Azure, GCP).
-
Source: https://docs.snowflake.com/en/user-guide/account-replication-intro.
-
Resources:
-
How to Configure Failover & Replication Across Clouds
-
.
-
Replication and failover commands https://docs.snowflake.com/en/sql-reference/commands-replication.
Snowpark
-
Snowpark is a set of libraries and code execution environments that run Apache Sparkâ„¢, Python and other programming languages with the Snowflake vectorised engine.
-
Allows you to create data pipelines with python, java, and scala.
-
Its functionalities include:
-
Snowpark DataFrames: DataFrame style programming for developers.
-
https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes.
-
Pandas Library: Pandas API for data transformations and analytics.
-
https://docs.snowflake.com/en/developer-guide/snowpark/python/pandas-on-snowflake.
-
UDFs(User-defined functions) and stored procedures: Custom code.
-
https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-overview.
-
Snowpark connection for Apache Spark: Run Apache Spark DataFrames, SQL and UDFs.
-
https://docs.snowflake.com/en/developer-guide/snowpark-connect/snowpark-connect-overview.
-
Source: https://www.snowflake.com/en/product/features/snowpark/.
-
According to Metha(2024)[11] the six benefits of building applications with Snowpark are:
-
One platform for all programming languages.
-
Bring compute and processing to the data.
-
Integrated governance and security.
-
Increase performance and reduce costs.
-
Using Streamlit to build interactive apps.
-
Distribution and monetisation with Snowflake Native Apps.
Practice 6 - Getting started with tables in Apache Iceberg
streamlit
-
Streamlit in an open source python framework that empowers data scientists to create data applications in minutes without front-end development skills.
-
You can build applications in a few lines of code with the API and view updates every time you save a change.
-
Finally, applications can be deployed and shared, managed directly from Streamlit.
-
Source: Data applications in python https://www.snowflake.com/resource/data-applications-in-python/.
-
Baum (2024)[12] argues that in order to create an application that makes use of LLM, the lifecycle of the project must be taken into account:
-
Development: Select and adopt the model that best suits the task.
-
Deployment: Ensure a pipeline with the right data and the right infrastructure.
-
Consumption: Expose the model to users via an application.
-
Resources:
-
https://docs.snowflake.com/en/developer-guide/streamlit/getting-started.
-
https://www.snowflake.com/en/product/features/streamlit-in-snowflake/.
-
https://www.youtube.com/playlist?list=PLavJpcg8cl1FA5cmCfdzdHyOBVT6VLhQX.
Practice 7 - Getting started with Snowpark for Python and Streamlit
Bibliography
- [11] METHA, M. (2024). Building Applications with Snowpark. Snowflake Special Edition. John Wiley & Sons, Inc.
- [12] BAUM, D. (2024). Generative AI and LLMs. Snowflake Special Edition. John Wiley & Sons, Inc.
Cover image: Photo of a clipping from the front page of the Snowflake website (2025/09).
If you are looking for a trainer to run this course or another training activity (webinar, workshops, bootcamps, etc.) in your organisation, you can find me through the contact page. Thank you very much.
If you liked the article you can help me by donating with cryptocurrencies. Thank you!!!


