Data Lake Governance: The Part Most Consultants Skip

Stackademic

These statistics may come as a surprise, but in 90% of cases, data lakes are launched without a management system. This conclusion was reached by a leading consulting company after auditing more than 40 projects. In such projects, the infrastructure works, integrations are configured, but responsibility, change management, and quality control remain unregulated.

The consequences of systems created without management are not immediately apparent. At first, there are isolated discrepancies in reporting, and later, different versions of key indicators. Ultimately, the company is forced to spend additional funds on consulting and platform restructuring. In market practice, companies that invested about $1 million in a data lake spent hundreds of thousands more on restoring manageability a few years later.

In this article, we will look at what a lake management system means and how it should be implemented. You will learn why this important part is often omitted from consulting projects and what steps can be taken to avoid financial losses.

What is a data lake governance system, and how should it be organized?

Data lake management is a system with a defined order of responsibility, rules, and controls that ensure consistency of metrics and predictability of changes. It consists of three interrelated parts:

  • Organizational part. It includes approval of data handling rules, definition of calculation standards, and appointment of responsible people. That is, it is necessary to establish who is responsible for the quality of indicators, who provides access, who adopts policies, and who approves changes.  
  • Technical part. A team responsible for implementing tools that ensure compliance with established rules is responsible for this part. It includes: a data catalog with metric descriptions, access control, quality monitoring, versioning mechanisms, and data lifecycle automation.  
  • Process part. Next, each day, departments must work according to a specific order: new sources must be added through an agreed procedure, changes must be documented, indicators must be checked, and outdated data must be archived.

Only the combination of these three elements forms a manageable system, rather than just a data storage infrastructure.

Why data lake governance is frequently skipped in consulting projects

There are two sides to a typical project: the consulting company and the business client. Consultants are responsible for technical implementation—architecture, integration, and pipelines. The client expects a working platform within the agreed timeframe and budget.

Usually, the management system is simply not included in the technical scope of work. The contract specifies the creation of infrastructure, but no one determines who is responsible for the quality of performance, approval of changes, access, and policies after launch. Unless specified separately, the management system does not appear automatically.

It is also worth considering that governance requires decisions at the company management level. It is necessary to appoint responsible people, approve uniform definitions of metrics, and establish a procedure for making changes. Consultants can offer a model, but without a business decision, it will not work. If the customer does not take on this part, the contractor completes the technical implementation, and the governance issue remains open.

Therefore, governance is ignored for various reasons: the customer wants speed and minimal paperwork, while the contractor wants to demonstrate technological results and win the tender. But both sides underestimate the fact that it is governance that makes the data lake useful in the long term.

The сonsequences of missing governance in enterprise data lakes

When a data lake is created without a management system, it quickly turns from a useful asset into a source of problems.

The main consequences:

  • Transformation into a “swamp”: data accumulates chaotically, without descriptions or classification, making it practically useless for analytics.
  • Loss of data quality: duplicates, outdated records, errors—all of this reduces the accuracy of business decisions.
  • Increased costs: companies are forced to spend significant resources on reprocessing, cleaning, and restructuring data.
  • Security risks: lack of access control can lead to leaks of confidential information.
  • Non-compliance with regulations: without management, it is difficult to comply with GDPR or local laws, which can result in fines.
  • Slowed development: Analysts spend more time searching for and verifying data than achieving new goals.
  • Missed business value: As long as analytics are inefficient, the company loses benefits, allowing competitors to pull ahead.

The cost of not managing your data lake

A well-implemented data lake can be a strategic asset for analytics. But without proper management, it only creates chaos and leads to additional costs.

Typical financial consequences:

  • Failed projects: up to 60–80% of unmanaged data lake projects do not deliver the expected value.
  • Additional system rework costs: companies are forced to spend an additional 30–50% of the initial budget on restructuring, cleaning, and reintegrating data.
  • Operational losses: Analysts spend up to 70% of their time searching for and preparing data instead of analyzing it, which directly reduces business productivity.
  • Regulatory fines: In the event of information leaks or GDPR non-compliance, fines of up to 4% of the company's global turnover are provided for, which can amount to millions of dollars.

An example of the scale of the problem

If a company invests $1 million in creating a data lake without management, in 2–3 years, it may spend another $500,000–$1 million to “save” it. In large corporations, a failed data lake can cost tens of millions due to productivity losses and penalties.

To verify whether these were isolated cases, we consulted with data lake consultants from Cobit Solutions. The latter works with enterprise projects and complex analytical infrastructure.

The company's Chief Data Architect shared the following case study:

 "We have different types of clients—some want to design a structure from scratch, while others need restructuring. In most cases, these are businesses that have tried to set up a data lake and encountered additional difficulties in their work. Lately, we have analyzed more than 40 data lakes. In 90% of projects, management was not actually laid out at the start."

We asked for a specific example, and the Chief Data Architect at Cobit Solutions shared a case study of a manufacturing company with several regional divisions. The Chief Data Architect of Cobit Solutions additionally shared a case study of a manufacturing company with many regional divisions. The Chief Data Architect was asked to help resolve discrepancies in management reporting. Technically, the system seemed to be working, but the lack of defined responsibilities and change management rules led to a loss of consistency in the indicators. After the audit, data domain owners were identified and uniform calculation rules were established. 

Customer feedback:

“We launched a data lake to centralize financial and operational reporting. The system worked stably for the first few months, and the discrepancies seemed insignificant—1–2% in different areas. We explained this by the methodology or the specifics of the departments. A year later, these deviations became more regular. And when the indicators began to give different scenarios during budgeting, it became clear that the problem was systemic.

We had to audit the sources and logic of calculations, establish uniform definitions of key metrics in the data catalog, and restrict changes without approval. We divided responsibility for indicators, set up version control for transformations, and introduced regular data quality checks. It took several months, but it allowed us to stabilize reporting and restore confidence in the figures,” said the company's chief operating officer.

How to prevent a lack of management

In general, it is always better to prevent a problem than to deal with its consequences. In other words, data lake management should be established before the platform is launched, rather than after the first discrepancies in reporting appear. For companies that are only planning to create a data lake, it is critical to take several steps before the technical launch:

  • Establish responsibility before the project starts. Identify the owners of data domains responsible for the quality of information and the individuals who approve changes. Roles must be formally documented.
  • Approve rules for making changes. Adding new sources, adjusting the structure or logic of calculations, must go through an agreed procedure.
  • Describe data sets and key indicators. Each data set must be assigned a fixed source, update frequency, and metric definition.
  • Provide for regular monitoring of access and data quality. Checks should be integrated into the workflow rather than performed after discrepancies arise.
  • Establish governance as an ongoing function. A data lake is an infrastructure system in which new sources are constantly being added, structures are changing, and the volume of information is growing. The governance model must support these changes and ensure the stability of the system.

And remember, don't wait until management reporting loses credibility. You can establish order right away.

What to do if the problem already exists

When the consistency of indicators in the data lake is compromised, it is difficult to solve the problem on your own in practice. Therefore, the first step may be to consult advisors who have experience in restoring manageability. They will help you through the entire process — from diagnosis to technical stabilization.

Steps that consultants should implement together with you:

  • Inventory of the current state of the system. This includes a list of active data sets and reports, a description of sources, update frequencies, and dependencies between them.
  • Prioritization. This refers to the selection of 3–5 critical indicators and domains that most influence management decisions.
  • Agreeing on definitions and calculation logic. This involves establishing uniform calculation rules for priority indicators and determining where they will be applied.
  • Assigning roles and areas of responsibility. This includes appointing those responsible for data domains and those who approve or agree on changes.
  • Implementation of a change procedure. This includes a procedure for adding sources and changes to the structure or logic of metrics, as well as recording decisions and versions.
  • Spot technical stabilization. If necessary, contractors add or configure tools that support the implementation of agreed rules: data catalog, access control, quality checks, and monitoring.

When the consultant has the relevant experience, these steps turn into a practical plan for change: the company gets transparent processes, a clear system of accountability, and tools that really work.

How to find good specialists

When choosing a team to work with a data lake, it is important to look beyond the list of technologies in the presentation. Technically, a lake can be launched quickly, but management requires experience in organizing processes, assigning responsibilities, and working with business units. If the team of specialists implementing the data lake does not ask questions about roles, the order of changes, and the consistency of indicators, it is focused only on infrastructure. In the long term, this approach leads to a loss of system controllability.

To assess the actual level of expertise of consultants, pay attention to the following criteria:

  • Experience in implementing the management model. Ask them to describe the approach and typical steps that the team takes when assigning roles, regulating changes, and launching quality control. Yes, the information may be confidential due to privacy rules. But even without disclosing client names, consultants should clearly explain the methodology and logic of their actions.
  • Experience working with problematic systems. Find out if the specialists have restored the consistency of indicators in already launched data lakes and what steps they took to do so.
  • Clarity in the distribution of responsibilities. Pay attention to whether consultants can specifically explain who is responsible for data domains, who approves changes, and how decisions are recorded.
  • Focus on management results. Professionals formulate goals through stable reporting and predictable indicators.
  • Plan for further support. Clarify what the work will look like after the project is completed and who will be responsible for maintaining order in the data.

This approach will help you choose a team that will ensure not only implementation but also stable system management.

Building data lakes without a proper governance framework: lessons learned

The main lesson is that technology alone does not guarantee results. A data lake may seem like a modern solution, but without management, it quickly becomes a source of confusion and expense. Companies need to know that data only has value when there is order: defined roles, transparent processes, and tools that support the rules.

An experienced consultant will help you carry out the necessary transformation. And the introduction of management will open the way for you to use data as a resource for development instead of an object of daily “firefighting.”

Frequently asked questions

How can you tell if a lake is set up without governance?

If there are no defined data set owners, no table registry, no formal change process, and metrics are interpreted differently, then there is effectively no governance.

Can governance be added after the lake has been launched?

Yes, but it is always more difficult and expensive than establishing a governance model at the outset. It is necessary not only to implement rules but also to correct accumulated errors and reconcile conflicting definitions.

Is a separate management team necessary?

It is not necessary to create a separate department. The main thing is to assign management functions to specific roles. This allows governance to be integrated into daily work without bureaucracy.

Can technical tools alone solve the problem?

No. Tools only enable the task to be carried out, but the rules and responsibilities are determined by the organization. Without a clear management model, even the best technology will not eliminate chaos.

Is a complex governance system always necessary?

The model should correspond to the scale of the business. For small companies, basic rules and transparent processes are sufficient, while larger companies require a more complex system. However, a complete lack of formalization creates risks even for medium-sized organizations: data quickly loses its value, and decisions become unreliable.