BigQuery — Serverless analytics warehouse

A serverless SQL data warehouse. Load billions of rows, run analytical queries in seconds, pay per byte scanned. Not a transactional database.

Easy Technical

1 min read

BigQuery is a fully-managed, serverless, petabyte-scale analytical data warehouse. Storage is columnar (Capacitor format), separated from compute (Dremel engine), and automatically replicated across zones. You define datasets (namespaces) and tables; tables can be partitioned (by date/integer) and clustered (by up to 4 columns) to reduce scan costs and speed up filters.

Two pricing models: (a) on-demand — $6.25/TB scanned, pay-per-query, best for sporadic use; (b) editions (Standard / Enterprise / Enterprise Plus) — reserve BigQuery slots (unit of compute) via autoscaling, best for steady workloads or predictable cost. Storage is billed separately (~$0.020/GB/month for active, $0.010 for long-term after 90 days untouched).

Loading data: batch via `bq load` / SDK (free), streaming via the Storage Write API ($0.025/GB ingested, strong for near-real-time), federated queries over / Cloud SQL / Bigtable / Drive (read in place, no copy), CDC from operational DBs via Datastream.

Partitioning + clustering are your main optimization levers. A query on a 10 TB table filtered by date → only scans the relevant date partition(s). Clustering further prunes column reads within a partition. Never use SELECT * in production — list the columns you need (columnar store = you pay by column).

BigQuery ML: train and apply ML models (linear/logistic regression, k-means, boosted trees, ARIMA time-series, neural networks, LLM embeddings) using CREATE MODEL ... SQL. Zero data movement, results land directly in BQ tables — a fast path for analysts to add ML without a full pipeline.

Common anti-patterns: using BQ as an operational DB (latency unfit for request-serving), running many small queries (fixed per-query overhead), scanning unneeded columns, forgetting to set a maximum_bytes_billed cap in shared projects (prevents runaway queries).

Grounded on https://cloud.google.com/bigquery/docs/introduction

Next up

Vertex AI — Google's ML platform

A unified platform to train, tune, deploy, and call ML models — including Google's Gemini family via the Gemini API. Covers the full ML lifecycle.