Intro
- A Serverless data warehouse
- Similar to all the other DWH cloud solutions, BigQuery decouples computing and storage.
- BigQuery maximizes flexibility by separating the compute engine that analyzes your data from your storage
- BigQuery Uses columnar storage format.
Computing options
- On-demand pricing = Serverless Sql Pool
- 1 TB of data processed is $5
- Flat rate pricing = Dedicated Sql pool
- Based on the number of pre-requested slots
- 100 slots → $2,000/month = 400 TB data processed on-demand pricing
BigQuery Interface
Let's look at the interface of Big Query. In this example, "taxi-rids-ny" is our project, "nytaxi" is our dataset and external_yellow_tripdata/external_yellow_tripdata_2019 are our tables.

- A project in Google Cloud Platform (GCP) is the fundamental organizing unit for all GCP resources and services. It provides a way to group and manage resources, billing, permissions, and settings.
- A dataset in Google BigQuery is a container used to organize and manage tables and views. It serves as a logical grouping of related data and is associated with a specific project.
- Tables: In Google BigQuery, there are several types of tables you can use, each serving different use cases depending on how the data is stored, accessed, or processed. Here’s an overview of the main table types:
- Native Tables: These are the standard tables where data is stored directly in BigQuery's internal storage.
- External Tables: These tables allow you to query data stored outside of BigQuery, such as in Google Cloud Storage or Google Drive. Useful for querying large datasets stored externally without incurring storage costs in BigQuery. Example: An external table pointing to CSV files stored in a Cloud Storage bucket.
- Views: A saved SQL query that acts like a virtual table.
- Partitioned Tables: Tables divided into smaller, more manageable chunks based on a specific column. Example: A Logs table partitioned by the log_date column.
- Clustered Tables: Tables optimized for query performance by organizing data within partitions based on the values of specific columns. Example: A table storing customer orders clustered by customer_id or region.
Partitioning vs Clustering
Big Query Partitioning:
- When creating a partitioned table in BigQuery, you can choose between:
- Time unit column
- Ingestion time(_PARTITIONTIME)
- An integer range partitioning