BigQuery ML Overview

Remember what you did in BigQuery ML and how it can help you with Machine Learning models using SQL.

When to Use BigQuery?

If a custom model on structured data is required, BigQuery ML can be used to build and train Machine Learning models using simple SQL.

To easily determine whether you're forecasting or classifying:

A typical example of a classification model is a "linear regression" model used for forecasting.

Where to Find BigQuery?

BigQuery is available in the Big Data tools section.

Pros and Cons of BigQuery

Pro: Once the connection is set up, you can re-run the same query to get the latest data from the spreadsheet.

Con: Since external data is not stored natively in BigQuery, it does not benefit from BigQuery's caching mechanism, which can affect performance when rerunning the same queries.

Caching in BigQuery

Caching is a feature of BigQuery's native storage but does not apply to external data sources.

Compare Training Between "Logistic Regression" and "Boosted Tree Classifier"

The comparison result is represented in a ROC (Receiver Operating Characteristic) curve. The closer the curve is to 1, the more precise the model.

This process is part of experimentation and hyperparameter tuning.

Other Performance Metrics: Apart from ROC-AUC, available metrics include accuracy, precision, and recall.

Linear Regression

Not discussed in detail here, but linear regression is another option for ML model training.

Boosted Tree Classifier

CREATE OR REPLACE MODEL `ecommerce.classification_model_3`
OPTIONS
(model_type='BOOSTED_TREE_CLASSIFIER'
, l2_reg = 0.1
, num_parallel_tree = 8
, max_tree_depth = 10,
labels = ['will_buy_on_return_visit']) AS
WITH all_visitor_stats AS (

Logistic Regression

CREATE OR REPLACE MODEL `ecommerce.classification_model_2`
OPTIONS
(model_type='logistic_reg'
, labels = ['will_buy_on_return_visit']) AS
WITH all_visitor_stats AS (

Second Example: Logistic Regression

Create a dataset called advdata and build a logistic regression model called txtclass. The model is designed for classification, where the label is source.

CREATE OR REPLACE MODEL advdata.txtclass OPTIONS (model_type='logistic_reg', input_label_cols=['source']) AS

Evaluate the ML model:

SELECT * FROM ML.EVALUATE(MODEL advdata.txtclass)

Make predictions with the ML model:

SELECT * FROM ML.PREDICT(MODEL advdata.txtclass
,(SELECT 'government' AS word1, 'shutdown' AS word2, 'leaves' AS word3, 'workers' AS word4, 'reeling' AS word5
UNION ALL SELECT 'unlikely', 'partnership', 'in', 'home', 'gives'
UNION ALL SELECT 'fitbit', 's', 'fitness', 'tracker', 'is'
UNION ALL SELECT 'downloading', 'the', 'android', 'studio', 'project'

Supported Models in BigQuery ML

BigQuery ML supports various models to perform different types of machine learning tasks:

Advantages of BigQuery ML

BigQuery ML offers several advantages over traditional ML approaches in cloud-based data warehouses:

Disadvantages of Data Export: