A blog series distilling QUANTITATIVE data concepts AND USE-CASES IN THE EINSTEIN ANALYTICS PLATFORM.
Essential Analytics 101- Database in normalized vs denormalized form and 'lowest grain' in Einstein Analytics (EA).
Table 1 below shows a Contact Object (or Table) in denormalized form. It has some redundant information such as email (ex.John's email needs to be stored 3x and Jane's 2x). Table 2 below shows the same information in a more efficient manner. Now there are 2 tables. The first table called Contact stores ID, contact and email, with the 2nd table storing Roles. The 2 tables are connected to each other by a 'KEY'. In this example, the ID field serves as the key. Normalized data is the main type of data structure used in modern databases today--primarily adopted to avoid redundancies, improve performance and minimize storage requirements.
However, there are some use-cases in Einstein Analytics that calls for denormalizing these database structures--primarily to speed up the queries and analytics that needs to be done to the data. As an example, an EA user might need information on role. In that case, user needs to 'munge' the contact object to the role object with the final table looking like table 1. In such a scenario, the augment node is used in an EA dataflow. The augment node has a left and right component with the left component containing the 'lowest grain' of the finished dataset. What is 'lowest grain'-- it is the most granular of the 2 tables. In above requirement, we need to see roles of people so the lowest grain needs to be the Roles table. More discussions later on the augment transformation and data flows.