UCOVI Stage 3: Organise

Business Strategy & People Management: ★☆☆☆☆ Technical Work: ★★★★★ Analytical/Scientific: ★★☆☆☆

Business Strategy & People Management: ★☆☆☆☆
Technical Work: ★★★★★
Analytical/Scientific: ★★☆☆☆

Discipline Overview

Organise your data into a trustworthy analytical model that respects data management principles and best practice but fits the needs of your business. Design and build the model so that processing errors are simple to flag up and rectify, and new features are straightforward to build.

Documentation should be performed at this stage and treated as a priority. Your company's data warehouse, lake or lakehouse is the museum for your business. Having bad or no documentation explaining what your core data table assets are, which reports they serve, and where they come from is like visiting an art exhibition where none of the paintings have labels for the artist, title, or date painted.

The model you choose should be adapted to the questions you wish to answer with your data and the size, structure and update-frequency of the data you collect. (You should know your questions and the anatomy of your data having passed the Understand and Collect stages.) But for an intuitive, tried-and-tested default, it's hard to better the Kimball methodology.

As your analytical model takes shape, prepare for layers of abstraction. Some data objects may be direct copies or replications of live data (eg. raw sales orders from a billing system) with no abstraction, others may be multi-purpose aggregations of raw data (customer purchases by product by week - let's call these layer-1 abstractions). Others still may be filtered or otherwise transformed derivations of layer-1 abstractions that serve specific reports or dashboards (layer-2). Document these data objects as such, and even consider a naming convention for your data tables around this. The higher/more downstream the abstraction layer, the greater the potential for deviation from the truth.

Questions and Considerations

What manual processes can we automate? What checks are in place for errors and gaps in data? Do we have a warehouse load error reporting system? How do we validate the values in our lookup tables and minimise data redundancy? If we lost some of our production data, would the data in our lake or warehouse be recent and complete enough to act as a backup?

People and Teams to Involve

IT and security teams, data engineers, data scientists and analysts.

Discipline Overview

Coding and engineering with data: designing data models and architecture that derive insightful features from raw data points.

Common situations faced

Debugging and fixing ETL failures, designing incremental data processing systems, building history and dimension tables in ways that minimise storage and maximise information retained.

Technologies to learn and specialisms to hone

SQL, Python, Alteryx, DBT, Data Bricks. Database normalisation principles, batch processing, debugging, query optimisation.

Next Stage: Visualise