The client is a digital marketing agency specializing in Amazon Marketing. Helping to scale brands through Amazon Advertising.
2 weeks of cooperation | Digital Marketing
BigQuery | Google Sheets
The client’s tech team created a new Google Sheets, querying data from a Big Query database. As the Sheet was used by 12 account managers managing more than 70 client accounts, the monthly Big Query bill shot up to $3350.
The company uses BigQuery as a Database. The Google Ads data from around 65 client accounts were coming into BigQuery on a daily basis. Big Query billing is based on query size. With the queries developed by the client tech team being executed, it used to result in a query data size of 34 GB ( and increasing daily ). We were tasked to optimize or reduce the data refresh size to reduce the billing cost.
On analysis of the problem, we realized that the queries written would give the results but were very heavy. The main improvements implemented –
- Partitioning was designed and implemented to ensure that the entire table is not queried.
- Scheduled queries were set up to insert new data automatically from the original table to the partitioned table on a daily basis.
- The queries were merged and optimized to remove duplication.
- ~30% of the columns were not used for further analysis. The SELECT * query – which is the most expensive query – was changed to include only the columns being used for further analysis.
Partitioning was used to reduce the data refresh size. Since partitioned tables have the advantage that they will not query the entire table. Moreover, Scheduled queries were set up to insert new data automatically from the original table to the partitioned table on a daily basis.
The data refresh size was reduced drastically from 134 GB to around 21 GB. This helped in reducing the billing cost for BigQuery. The bill was reduced from $3350 to around $430 which was a significant change for the client.
Data Before Optimisation
Data After Optimisation