What every startup should have in place before hiring their first data person.
1. Instrument your product from day one (events, not page views)
Page views tell you almost nothing useful. Events tell you what users actually do. The difference matters enormously.
A page view tells you someone visited your pricing page. An event tells you they clicked "Start Free Trial" after viewing pricing for 45 seconds, having previously completed onboarding step 3 but not step 4. One is noise; the other is signal you can act on.
Start with a simple tracking plan: a spreadsheet listing every event you track, what it means, what properties it includes, and where it fires. This sounds bureaucratic, but it prevents the chaos that comes from engineers adding events ad hoc without coordination.
For implementation, pick one analytics tool and use it consistently. Mixpanel, Amplitude, and PostHog are all reasonable choices. PostHog has the advantage of being self-hostable if data residency matters, which is increasingly relevant in regulated industries. See my piece on technical leadership in regulated startups for more on this.
2. Pick one source of truth for metrics
This sounds obvious but most startups fail at it. The CEO pulls revenue from Stripe. The CFO uses a spreadsheet. The product manager looks at the analytics dashboard. None of the numbers match.
Pick one system to be the source of truth for each key metric. Document it. Make sure everyone knows where to look. At seed stage, this might just be a Notion page with a list: "Revenue: Stripe dashboard. MRR: Finance spreadsheet, updated monthly. Active users: Mixpanel, 'Weekly Active Users' report."
The specific tools matter less than the consistency. What kills you is having three different definitions of "active user" in three different systems, and no one knowing which one is correct.
3. Structure your database for reporting, not just features
When engineers build features, they optimise for the application's needs: fast reads, efficient writes, simple queries. Reporting has different requirements: you need to aggregate data over time, join across tables, and answer questions the original schema was not designed for.
This does not mean you need a data warehouse. It means thinking ahead when designing your schema. A few principles help:
- Include timestamps on everything. Created at, updated at, deleted at (soft deletes are your friend).
- Store state changes, not just current state. If a user upgrades their plan, do not just update the plan field; create a record of the change.
- Use consistent ID formats across tables. If you use UUIDs, use them everywhere.
- Think about what questions you will want to answer in 12 months. "How many users who signed up in March are still active in June?" is a common one; make sure your schema can answer it.
This is the kind of thinking a technology roadmap should include: not just features, but the data infrastructure to understand whether those features work.
4. Own your data (avoid vendor lock-in)
Every analytics vendor wants you to send data directly to their system. This is convenient but creates dependency. If you want to switch tools, or use multiple tools, or build something custom, you are stuck.
The solution is to own your raw data. This means:
- Store events in your own database or storage before (or as well as) sending to third parties.
- Use a customer data platform like Segment or RudderStack that lets you route events to multiple destinations.
- Prefer tools with good export capabilities. If getting your data out is hard or expensive, that is a red flag.
- Avoid proprietary query languages and formats where standard alternatives exist.
At seed stage, this might just mean ensuring your analytics tool has an export function. As you grow, you might move to something more sophisticated. The key is not making decisions now that trap you later.
5. Document what you measure and why
This is the most neglected foundation, and often the most valuable. Six months from now, no one will remember why "user_activated" has the definition it does, or what the difference is between "subscription_created" and "payment_successful".
Documentation does not need to be elaborate. A simple tracking plan spreadsheet with columns for event name, definition, properties, and owner is enough. Update it when you add new events. Review it quarterly to remove events no one uses.
The discipline of documenting forces clarity. If you cannot write a clear definition of what an event means, you probably should not be tracking it. This connects to the broader principle of making deliberate decisions about what you build and why.