Building a data team from the beginning

Key takeaways from this episode

There’s this funny thing when you are the user of a software product, but you work for the company that makes that software product, then you have this dual role. You have to be a data engineering manager, but then you also have to explain and advocate for the platform that you’re using.

How do you balance these responsibilities? Are you mostly spending your time delivering data outcomes to the business, or are you mostly spending your time on stages in front of audiences?

That’s one of the reasons I joined Snowflake. Before joining Snowflake, I was a Snowflake customer. My team was implementing a data platform on Snowflake. It may sound a little cliche, but when I got introduced to the Snowflake platform, it was love at first sight. The ease of use and so many things.

At Snowflake, my core responsibility is building data products and data-driven solutions, which help Snowflake’s internal businesses across different verticals. But additionally, one of the roles I play here is talking about use cases in the platform. I work closely with the marketing team, sales engineering team, and sales team. I give many keynotes, breakout sessions at global events and with the developer community I’m close to.

That's explicitly a part of your role?

That’s not part of my role. My role is evolving into that. If you say on the books, the title I own, it’s not part of my role. But I love doing it. And the leadership here is very, very appreciative. If you are a proud user of something, whether it’s a tech product or any day-to-day utility product, then you automatically try to market it. What I do here at Snowflake, those use cases which I build, I just go and talk about it to the world, to the data community.

Let's do it. Tell me, how does Snowflake do data engineering?

First of all, before I jump into it, I just want to mention that I’m here at my personal capacity. This is not sponsored by Snowflake. Since we’re a cloud data platform, we take data very, very seriously. This is my sixth organization where I’ve been working in the last 14 years. And it is truly a data-driven organization.

We practice data. We live and breathe data. Not only the data engineering team, all the functions, be it sales, sales engineering, marketing, finance, workplace, every function tried to have this data-driven mindset. My team is a horizontal team within Snowflake. And my team supports different verticals-GTM, finance, legal, and other verticals.

We have a centralized repository of data where all the data which belongs to Snowflake comes to a single-tenant, single platform. And then based on the domain, when I say domain, you can call it the verticals, we cater to them. Most of the time we create data models.

My team spends 80% of the time in analytics engineering creating data models, common data models, some aggregations, and then using data quality, observability, and data governance. We then share this with these business units so that they can create their own analytics if they have their own analytics team. Or sometimes we are engaged in enriching their source system data. We reverse it here or push back this golden data to their source systems, let’s say Workday, Salesforce, ServiceNow, Jira, these sources.

Snowflake has been using dbt for a long time. I'd be interested to hear if you feel like your use of dbt is different or novel based on your unique role in the ecosystem. I'd be curious to hear if there's other tooling that you use in your stack that's worth talking about.

Yeah, so the stack is very big, but we’ve used dbt since the beginning, especially for data modeling and analytics engineering purposes. We are very, very satisfied users of dbt. And especially my team, they spend almost 70% to 80% of their time writing models in dbt and deploying them.

What do you think about the talent market for dbt in India? It's still a new-ish product. There's, probably deep benches of talent in India using different ways to get similar jobs done. Do you have a hard time sourcing DBT talent in India or do you think there's a lot of it there?

As you said, India is majorly heavily relying on Spark, Informatica and ETL. dbt is picking up especially with niche companies or new-edge tech companies. But I would say I still find some difficulty sourcing talent.

This is where things get complicated. Snowflake made an announcement about Polaris at Snowflake Summit. There's internally managed, like Snowflake's managed catalog, and then there's externally managed catalogs. And I'm just curious if you could help us figure out the differences between these things and the advantages and the limitations.

You said it right. Storing data in Iceberg format is one thing, but unless you have a catalog, you will not be able to query the latest data or keep track of the latest and all the atomic properties if you want to leverage it, right?

When this Iceberg table format started getting traction, each platform or like Snowflake, Databricks, they all started creating their own catalog. And you need to understand what a catalog is not. A catalog is not storing the actual data. It is just a pointer to the data, which is stored somewhere in the cloud in Iceberg format. A catalog is just keeping the pointer to the latest data or the latest files.

You can think of it as metadata. It keeps track of metadata. Now, where do you keep this metadata? One way of doing this is you keep this metadata with Snowflake in the Snowflake Managed Catalog. You need not worry about the UI or the console or how you and your team will view the catalog.

I think of Snowflake's history with AI and LLMs as having two distinct phases. There's the pre-Sridhar phase and the post-Sridhar phase. And then the post-Sridhar phase is more like Cortex. Do you think that's an appropriate way of thinking about this?

Yeah, definitely. Sridhar comes with a lot of experience in artificial intelligence, especially in Semantic Search. He’s a technologist, by his work in the past with Google, with his own startup.

The moment Sridhar joined Snowflake, all of a sudden Cortex came into picture.

The core philosophy of Snowflake is simplicity, the way the platform was built. Cortex functions, whether it’s machine learning powered functions or LLM functions, these things are so simple to use. And there is so much excitement within Snowflake about these functions. That was the shift which happened post-Sridhar, where everybody is empowered to use these large language models, not directly but in the form of SQL functions. And there is lot of talk about how to expand that and create more.

Since Cortex, have you seen adoption of AI in the Snowflake platform accelerate?

100%. Sometimes I think we are doing too much inside the company. Everyone, not just the data team, but also the project management and all the other non-technical teams, can write SQL. We still have to figure out internally a lot of use cases which will impact at scale. But still these Cortex functions we are using heavily internally.

Do you have any dbt pipelines that are just end-to-end Snowflake dynamic tables? I have not personally gone all in on dynamic tables, but I'm curious if you've pushed it hard.

We are right now in that phase where we are moving some of the pipelines, which were managed through Airflow Decks to complete dynamic tables using dbt. And I would say we are not completely there with end-to-end pipeline using dbt using dynamic tables inside dbt. But one of the initiatives we are currently doing where we are migrating from those Airflow Decks to dynamic tables and on dbt itself. Those are more from the master data management side.

How Can We Help You?










    How Can We Help You?










      SagasIT Analytics