ohai.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A cozy, fast and secure Mastodon server where everyone is welcome. Run by the folks at ohai.is.

Administered by:

Server stats:

1.8K
active users

#bigquery

2 posts2 participants0 posts today

🚀 DataTalksClub's Data Engineering Zoomcamp Week 3 - BigQuery as a data warehousing solution.

🎯 For this week's module, we used Google's BigQuery to read Parquet files from a GCS bucket, and compare querying on regular, external and partitioned/clustered tables.

🔗 My answers to this module: github.com/goosethedev/de-zoom

Homeworks for the DataTalksClub's Data Engineering Zoomcamp 2025. - goosethedev/de-zoomcamp-2025
GitHubde-zoomcamp-2025/03-data-warehousing/README.md at ecb1f1f3fc69b8d10703eb07328567dab2acf688 · goosethedev/de-zoomcamp-2025Homeworks for the DataTalksClub's Data Engineering Zoomcamp 2025. - goosethedev/de-zoomcamp-2025

FFS. Turns out (after I built a feature) that you can't supply a schema for BigQuery Materialised Views.

> Error: googleapi: Error 400: Schema field shouldn't be used as input with a materialized view, invalid

So it's impossible to have column descriptions for MVs? That sucks.

Whilst migrating our log pipeline to use the BigQuery Storage API & thus end-to-end streaming of data from Storage (GCS) via Eventarc & Cloud Run (read, transform, enrich - NodeJS) to BigQuery, I tested some big files, many times the largest we've ever seen in the wild.

It runs at just over 3 log lines/rows per millisecond end-to-end (i.e. inc. writing to BigQuery) over 3.2M log lines.

Would be interested to know how that compares with similar systems.