Chapter 2 Basic Queries
In this chapter, we will explore the basics of SQL queries and syntax in BigQuery. We will cover the SELECT statement, filtering data using the WHERE clause, sorting data using the ORDER BY clause, limiting the number of results using the LIMIT clause and aggregating data using the GROUP BY clause.
2.1 The SELECT statement
The SELECT statement is used to retrieve data from one or more tables in a database. It has the following basic syntax:
SELECT column1, column2, ...
FROM table_name;In this syntax, column1, column2, … are the columns you want to retrieve data from, and table_name is the name of the table where the data is stored. For example, the following query retrieves the status, name columns from a table named tst_storms:
SELECT status, name
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`;SELECT * allows you to select all columns from a table. For example, the following query retrieves the all columns from a table named tst_storms:
SELECT status, name
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`;BigQuery stores table data in columnar format, meaning it stores each column separately. Column-oriented databases are particularly efficient at scanning individual columns over an entire dataset. Therefore, it is generally discouraged to use * as you are charged per column of data you retrieve from BigQuery, so it is best to explicitly state which columns you wish to select. Column-oriented databases are optimized for analytic workloads that aggregate data over a very large number of records. Often, an analytic query only needs to read a few columns from a table. For example, if you want to compute the sum of a column over millions of rows, BigQuery can read that column data without reading every field of every row.
SELECT DISTINCT is a variation of the SELECT statement that returns unique values of the specified columns. This can be useful when you want to get a list of all unique values in a column, or when you want to eliminate duplicate rows from your query results.
SELECT DISTINCT status, name
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`;2.2 Filtering data using the WHERE clause
The WHERE clause is used to filter data based on specific conditions. It has the following basic syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;In this syntax, condition is the condition that must be met for a row to be included in the results. For example, the following query retrieves all storms where wind was greater than 70:
SELECT wind, status, name, hour, month, year
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`
WHERE wind > 70;One specific case worth mentioning is retrieving null values. The following query retrieves all storms where hurricane force diameter is not null.
SELECT hurricane_force_diameter, wind, status, name, hour, month, year
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`
WHERE hurricane_force_diameter IS NOT NULLAnd if you wanted to find storms that did have a null hurricane force diamater, you would remove the NOT.
2.3 Sorting data using the ORDER BY clause
The ORDER BY clause is used to sort data in ascending or descending order based on one or more columns. It has the following basic syntax:
SELECT column1, column2, ...
FROM table_name
ORDER BY column_name [ASC|DESC];In this syntax, column_name is the name of the column used to sort the data, and ASC or DESC specifies the sorting order. For example, the following query retrieves all storms sorted by wind in descending order:
SELECT wind, status, name, hour, month, year
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`
ORDER BY wind DESC;2.4 Limiting the number of results using the LIMIT clause
The LIMIT clause is used to limit the number of results returned by a query. It has the following basic syntax:
SELECT column1, column2, ...
FROM table_name
LIMIT number;In this syntax, number is the maximum number of rows to be returned by the query. For example, the following query retrieves the top 10 recorded wind speeds:
SELECT wind, status, name, hour, month, year
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`
ORDER BY wind DESC
LIMIT 10;2.5 Aggregating Data
You can use function such as COUNT, SUM, AVG, MIN, and MAX functions to aggregate data. The syntax for the aggregate functions is as follows:
SELECT aggregate_function(column)
FROM table_nameIn this syntax, column is the column used to aggregate data.
For example, the following query calculates the average wind for all storms in my dataset:
SELECT AVG(wind)
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`;2.6 Grouping Data
You can use the GROUP BY clause to group data based on one or more columns. The syntax for the GROUP BY clause is as follows:
SELECT column1, column2, ...
FROM table_name
GROUP BY column1, column2, ...;In this syntax, column1, column2, … are the columns used to group data. This works extremely well with aggregation functions.
For example, the following query gives the average wind speed by storm type.:
SELECT status, AVG(wind)
FROM `dft-stats-gcp-showcase.TS_Training.tst_storms`
GROUP BY status;In this chapter, we covered the basics of SQL queries and syntax in BigQuery. In the next chapter, we will explore more advanced queries, including creating tables, joining them, and inserting data.