Welcome back to the third and final post in my series, Your First 60 Days as a First Data Hire. In my last post, I discussed advice around setting up foundational infrastructure, understanding the common metrics of your industry, and gently guiding your colleagues.
Today, I’ll explore the build, iterate, and test cycle, as well as documentation, process, and managing relationships.
As a reminder, Data Columns is a community-led newsletter by me (Pedram Navid!) that brings practical advice about leveling-up as a data practitioner. My focus is bringing relevant, topical advice on practical matters that data people care about to your inbox.
If you want to know more about the newsletter, or want to contribute, drop me a line on Twitter or by email.
By this point, you should have a solid understanding of the business, the product, your team, and your partners. You've come up with a preliminary view of what your team's purpose should be, and have planned out how you will maximize leverage by looking for areas where you can have the greatest impact.
The final weeks are spent on execution – and being able to execute well is much easier when the upfront work has been done already.
While the type of work can vary greatly from company to company, in my experience most of the early work of a first-data hire falls into one of the following areas: infrastructure and instrumentation, modeling, and reporting and analytics.
We'll dive into each of these areas, but first: a general theory on execution. Much like software development, a data team's approach for executing is the build, test, and iterate model. There's much already written about this type of model and it's unlikely to be new to most of you, so I'll keep it brief.
It's rare that both you and a stakeholder have a clear vision of exactly what you need before you've started development work on it. Even if you do, it's very likely that as you build, your ideas and understanding will evolve through the act of building. Whether it's building a dashboard, or a data model, or a new analysis, testing what you're building while it's being built can help validate that you're on the right track. Testing also helps grow confidence with your stakeholders, and allows you to continuously refine the development process. There's nothing worse than getting a request from an executive, thinking you understood the request, and spending a few weeks building something you're proud of – only to have them disappointed with what you've built.
By short-circuiting that loop to bring the stakeholders in more frequently, you'll help avoid these types of disappointing, misaligned outcomes which are still all too common.
Building The Infrastructure
As a first data-hire, you need to optimize for infrastructure that's easy to maintain. In the early stages, you likely won't have time to operate a custom-built in-house stack, so being able to trade off time and money is critical.
You will need to select a data warehouse and a tool for ingesting data into your warehouse. I will avoid outright recommendations, but keep in mind that your time is often more limited than your company's budget. Unless you have a data engineering team at your disposal, or are confident in your capacity to quickly hire more engineers, I would lean towards buying before building.
Using off-the-shelf tools makes it easy to ingest data from your production database and various other systems. dbt is an obvious choice for a transformation layer and hard to argue against. You have several choices when it comes to running dbt. dbt Cloud makes it easy to schedule runs and integrate with your git repo to perform CI checks, but it's also possible to run dbt through the CLI. Either option works, as long as you're not spending too much time setting it up.
Another question to ask yourself is whether or not you want to track product analytic events. Events are hard for a variety of reasons, as this great thread by Erika Pullum explores. I would recommend that you initially avoid events – because good event tracking requires coordination between engineering, product, and data teams that's not easy to master.
As you start to deliver value and build out a team, and develop a better understanding of pain points and gaps, you can shift some of your focus to instrumentation. It's very common for instrumentation to be subpar in the early days. I've met maybe two people who have a good system in place for product instrumentation, so don't fret if you're not perfect here out of the gate.
One of the unanswered questions of data modeling is what approach to take. There's some great Coalesce talks, good discussions online, and it seems, no consensus. Given that, take my advice with a grain of salt.
What I tend to do is focus on the business entities that are core to your business, and find ways to define those. For example, at Hightouch we have users, and each user has a workspace. Within a workspace, there's a model, a sync, a source, and a destination. I created models for each of these within the
core schema. Even this small step was really powerful when you consider some of the odd joins and strange names I had to deal with in the data mines.
From there, higher-level concepts might emerge. You might have a notion of a customer, or you might have workspace-level features which are aggregates of the various entities that make up a workspace. This is where the real build, iterate, and test cycle occurs. It's hard to come up with the perfect data model upfront, but starting with some of the basic entities that matter is a great way to get your feet wet.
From there, use cases will emerge. Marketing might ask for attribution (just say no), or sales might ask for product data in their CRM. You'll start to develop schemas for the various business functions that ask the most questions. Data models aren't set in stone, and they can always change.
Snapshots are one of those unavoidable things in life, but don't feel the need to go too heavy too quickly. Wait until an actual use case emerges, and then carefully consider if you need one. The fewer snapshots you have the better. Read this guide by Shopify for a great breakdown of why you might need one.
Incremental models: absolutely do not use them until you absolutely have to. Resist every urge. This is the only advice I will give on this topic.
Reporting and Analytics
Once you have some basic models created, you'll want to start building some light reporting and analytics to help gather some insight from the data. Because this becomes highly contextual, it's hard to give concrete advice on what to build. As a general rule, I would suggest focusing on iterative building as much as possible.
Tools like Hex make it easy to glean insights and build out analyses from your data warehouse. You can use these to further refine your data models as part of the build, test, iterate loop. If you prefer dashboards, Apache Superset is fairly light-weight although limited. Looker is likely too heavy a lift as a first tool in the stack; although it has a lot of great features around governance, it can be a pain to work with early on as the only data hire. There's also some other great alternatives brewing – Lightdash is one I've been keeping an eye on and can be great for quickly prototyping various charts and tables of your business data.
Just remember: there's no perfect data model, and there's no perfect report. It's a continuously evolving process, and one you should always feel free to change or even toss. If a dashboard isn't being used, delete it. Don't let it linger for someone to accidentally find three months from now and wonder what some metric means. Save your future self the pain.
What The Future Holds
Throughout the last few weeks, you'll have your hands full with executing on your plan. Be forewarned that, from that point on, things will only become more chaotic. If you're feeling imposter syndrome, don't worry, we're all imposters.
If you're feeling overwhelmed, join the club. It's normal. Requests for data will always outpace the data team's demands. As you begin to deliver value to the organization and re-think what the goals of your data team are, you may start thinking about hiring to help with the workload. Be careful of the trap of thinking adding more FTE will mean more time for you to do work: it won't. You'll soon find yourself working as a manager, not a data practitioner. But a team of three great analysts will out-pace your capacities very quickly, so the leverage you add will be net positive.
If you find yourself successful in your role, I'd also encourage you to think of ways you can give back. I've been fortunate enough to learn from the community and to have a place where I can try to give back what I've learned over the years. That might look like answering questions in dbt Slack, offering to do free mentorship, or contributing a newsletter or blog post on a subject you care about. I can tell you the most rewarding parts of my career have had very little to do with analyzing data, and everything to do with helping people be successful.