10 BIG Mistakes-in-Data-Projects (Part 2 of 2)

The 2nd part of our 10 big mistakes in data projects, and how to avoid them.

4 years ago   •   11 min read

By Chris Lubasch
Table of contents

This is the second part of our little series about 10 BIG mistakes that companies regularly do, when they run data or analytics projects. I've worked with many clients over the years, and over time, some clear patterns popped up.

In other words: typical problems that burn your money, staff and time. The good side about it: all of them can be avoided.

This is part 2 of 2 of the series, with mistakes 6-10. Find the first part containing mistakes 6-10 HERE. You don't have to read part 1 first, but it will help a bit with undertanding the context.

You are a video person?

You can also WATCH this article as YouTube video. If you like it, please give it a like, subscribe to my YouTube channel.

10 BIG mistakes in data projects (part 2)

A small reminder: You want doing these mistakes, because you will likely waste a lot of money. Reality is neither black or white, so take this advice with a grain of salt when it comes to the specific sitution in your company. However, what I describe here has become a costly challenge for clients I've worked with.

[#6] Doing everything technical on your own

Number 6 isn't just a promotion for cloud computing. Much more so, it's a quest to challenge if you should deal with all the technological topics and obstacles on your own, or if there is a solution which is managed by someone else.

Running a flexible and scalable data architecture is difficult, more difficult than many think. All major cloud platforms like Amazon Web Services, Google Cloud or Microsoft Azure have managed solutions that will save you from dealing with the barebone details of, for instance, maintaining a database server on your own.

In fact, much external software is available from vendors in a managed fashion, such as business intelligence software and other pieces that have traditionally been installed on-premise (i.e. in your own datacenter).

This can save tremendous amounts of time, allowing you to focus more on VALUE CREATION instead of technical enablement. Most companies don't have extremely well-staffed data teams or earn money with building and developing data solutions by themselves.

Consequently, think twice if there's a real benefit of doing everything on your own. Of course, topics like data privacy and security play a major role in this consideration.

But from my experience, there's almost always a way to save resources and gain more focus time, while compliance isn't an issue. This even holds true for industries like banking & finance.

Speedcurve Performance Analytics
Photo by Luke Chesser / Unsplash

[#7] Big data stacks are often not necessary

Some years ago, big data was probably the most used and especially misused term in the field of data. People were preaching that with big data, life as we've known it will come to an end, and if you don't jump on that train, you'll be forgotten in no time.

Yes,  humanity produces so much more data than ever before. Yet, many companies still have to unlock business impact from data & analytics that doesn't have anything to do with big data. But does that mean everything is a big data issue going forward?

No, not everything is a big data use case or issue. A very simple way to think about it is: you'll need big data, when all the typical / classic solutions you have will not make it anymore. Big data stacks in return, have been designed to deal with extreme amounts of data (and a few other scenarios that we will not cover today).

Unless you really need a classic big data stack, such as a Hadoop cluster on a data lake, you shouldn't go for one. It's way too complicated, way to expensive and super difficult to maintain.

Answering business questions from marketing for instance, like "what should I do to increase my new user acquisition by 10% in the next quarter" doesn't require you to go all-in on big data.

Thankfully, lots of great solutions have appeared on the market, that give you the benefits of traditional big data (simply speaking dealing with very large amounts of data in a very fast fashion), but still being as simple to use as classic solutions.

One area to see this are databases and data warehouse solutions. Players like Amazon, Microsoft, Google or dedicated companies such as Snowflake have brought up data warehouse solutions that abstract all the technical difficulties of big data away, giving users the benefits but not the downsides.

These solutions do support classic SQL for database querying (which often isn't exactly the preferred way in traditional big data), can host and transform huge amounts of data in little time - yet, they don't require you to know anything of distributed computing, network throughput optimization, and so on.

Small disclaimer though: What people tend to call big data does have a place in the picture. And of course, some large companies or use cases generate extreme amounts of data (for example IoT sensors or dealing with images and videos) for which normal solutions won't suffice.

If you plan to unite every piece of data inside your company, you'll probably have to go down that route as well. But for many companies, especially the smaller ones, big data is just not necessary.

Photo by Riccardo Annandale / Unsplash

[#8] Focusing too much time on enablers instead of value creation

As discussed before, you should be clear about your value creation. How does you product or service generate a value for your customers? Why is it better than others, or is it even unique?

Whatever it is, data is here to support you on making this value proposition and creation even stronger. It also helps you, to become more customer-centric and use all the insights to server your customers even better.

However, you don't have unlimited resources available. In fact, in many companies, data teams (including those working with data from functional teams like marketing) spend too much of their time on "making it work" instead of using the data.

Let me make it even more concrete: Data teams often spend a considerable amount of their time (actually, sometimes up to 80% of their time from my experience) to build and maintain the data technology stack.

What they should really do, is spend as much time as possible in helping the organisation to use data, analyse it and come up with impactful insights and action recommendations.

If you look at Data Scientists, it's kind of the same story, just that they spend much of their time on bringing data together, cleaning it, etc. But they should spend more time on analysing the data, come up with advanced data models, predictive use cases or whatever it is they want to achieve. Right?

My point here is simple: try to find a setup and modus operandi, by employing best practices in both technology as well as the way you work inside your team or company, that allows to focus on value generation.

I assume this makes sense to everyone of you, but the obvious question is: how do we make it happen? I'll give you two examples, but I'm sure you can find more on your own.

Example 1: Don't waste more time than necessary on data integration (i.e. bringing data together from different sources) if not necessary. You shouldn't try to develop all connectivity on your own. There are solutions available in the market, that can simplify this process tremendously, especially if you're dealing with common data sources.

Example 2: Make a careful consideration of functionality that you really need to deliver to others inside the company. From what I have seen often, many reports and functionality (let's say whole parts of the data model) are simply not used, or at least not used often, but they still require you to maintain the complexity that comes with it. Maybe it was needed in the past, but is it still today? Could free up your resources if you sunset parts of the solution for a while - doesn't have to be forever.

Code on a laptop screen
Photo by Luca Bravo / Unsplash

[#9] Treating it as an IT project

Don't get offended by this one please, especially if you're working in a technical role. But to be honest, I think running a data project with 100% ownership in the IT department is a bad idea.

I'm not saying this because the developers will screw it up - guys, I'm having a background in computer science on my own! IT is a critical partner in digital! - but there are good reasons to not make data an IT-only project.

First, I strongly encourage you do seek business impact, value and requirements first, before you even think about the technical realization of your data architecture. While your IT can have their own use cases of course, most of them typically come from other departments like marketing, sales, product management, finance and so on.

Don't build data landscapes or solutions because you can, or because it's cool. Do it because there is a clear business value from it, coming from ideally very concrete use cases. In my opinion, functional business teams have a much closer connection to the value creation on the business side then IT (and hey, that's also not exactly their job).

See it like product management: those are the folks who make sure that skilled technical colleagues build what is really needed from the market.

Second: I talked about it earlier, but don't make the mistake of seeing data projects as a tech only topic.

Third: skills available in typical IT functions such as backend or frontend development, systems administration, DevOps, etc. are not enough to build a scalable data pipeline or architecture. Chances are, that someone from IT has this additional experience, but it's a different field of work.

If I am a good Tennis player, does it mean I'm good at football? No. It doesn't even mean that I'm good at table tennis, because the technique is very different. But of course, there are also some similarities.

Depending on the scope of your data solution, you should make sure to have at least some advice on how to do it from a data perspective (whether this knowledge resides in your IT, data team or externally).

Let me wrap this up by saying that I think your IT can play a crucial role in data projects, and if you don't have a dedicated data team, they should!

Lots of areas are covered well by them, like for instance, building infrastructure on cloud platform. But when it comes to topics like data warehousing, parallel processing, data pipelines and several more, it become an issue of specialisation sooner or later.

White neon wallpaper
Photo by Austin Chan / Unsplash

[#10] No clear roadmap or strategy

Well, number 10 is easy to grasp I think - still, having seen many companies not following here. As I outlined before, there should be clear idea, ideally a strategy why and how you use data in the company to gain commercial benefits.

We call this Data Strategy, and I think it's one of the most important tools to get right. Anyway: even without such a strategy, there should at least be a roadmap of what you're about to do on the data project side.

I really prefer agile approaches, but that doesn't mean that there's no plan. See it like this: building a data architecture with potentially multiple people is a lot of effort. That also means, it's costly.

What you want to avoid, is having to redo the entire thing 6-12 months down the road, because you realize it doesn't fit the big picture, fails to meet the business requirements or became otherwise a bottleneck.

Once people started to actively use your data solution, it will also become much more difficult to change things big time, because you'll have to develop something new AND maintain the old one for the time being, both at the same time.

Hence, it makes a lot of sense to follow a plan, that involves business stakeholders, and create a transparent, ideally agile and little risky project for your organisation. We'll talk about best practices here as well in a future video.

Quick recap of key takeaways

Let's quickly recap what we have learned:

  • Try to avoid doing everything on your own on the technical side, just because you can - use managed solutions if possible to avoid losing too much time
  • Also, try to avoid dedicated big data architectures and concepts, unless you have to - it's much more difficult and therefore often expensive than necessary for many companies and use cases
  • Focus on value creation aspects, and don't spend your time on topics that don't contribute to it in the same manner
  • Don't forget, that data is more than technology, so make sure business and technical stakeholders are en par in driving such an important topic
  • Come up with a proper roadmap or even better a data strategy (combining tech and business as well) for the mid- to long-run

Bonus tip: Transparency.

Thanks for reading that far! I have one more bonus tip for you. It's all about transparency.

Data projects are often complex. Multiple stakeholders are involved, technology and business considerations need to be taken care of. Typically, the more complex the projects become - the more of an issue transparency will become, too.

While this doesn't apply to data projects only, I've seen it as an issue here very often. Hence I say: Make sure to create transparency about the project.

In addition to mistake #10 - which was about creating a clear roadmap - you should put additional effort in your project communication.

Toshi the black pug
Photo by Charles Deluvio / Unsplash

Here is what you don't want, because it's very risky and creates a blackbox for others: You don't want to agree on a project scope, potentially come up with a project plan - and then go radio silent until you delivered your final result.

Meanwhile, business requirements you have gathered at the beginning might have changed, so you would be better off to regularly check-in with your stakeholders. Other than that, you really want to get input and feedback early on. See it like this: develop the product or solution TOGETHER with your stakeholders.

What is indeed pretty specific and common to data projects, is that you'll usually have multiple data sources which should be made available to users - either in form of a report, an analysis or even a full data model.

I would not advice to first connect all data sources, then do your modeling and then build lots of final reports. Instead, I'd suggest to do a "full run" on a single source if possible.

So try to create business value from each data source, product a semi-final outcome and connect with your stakeholders to get their feedback. Once you're done, go and connect the next source and start all over. This will ensure a highly transparent process, with lots of feedback.

Lastly, imagine you would be the project sponsor of a data project, in other words: you bring the budget or money to the table. It is pretty natural, that you would want to have a certain level of transparency what's happening with your budget.

Since this is not supposed to be a project management course I keep myself short, but you really want to make sure transparency and communication to your sponsors and stakeholders is a high priority.

I had a lot of fun going through memories over memories of past projects, to come up with these 10 mistakes that I have seen companies doing over and over again.

If you found this content useful, feel free to subscribe to my Know What's Next Weekly newsletter, to not miss piece like this.

Spread the word