Analytics Lab:
A blog series distilling QUANTITATIVE data concepts AND USE-CASES IN THE EINSTEIN ANALYTICS PLATFORM.
The power of Augmented AI: Harvard Business Review Article by C. Longoni and L Cian- Oct 14'2010/14/2020 In fact, we found that people embrace AI’s recommendations as long as AI works in partnership with humans. For instance, in one experiment, we framed AI as augmented intelligence that enhances and supports human recommenders rather than replacing them. The AI-human hybrid recommender fared as well as the human-only recommender even when experiential and sensory considerations were important. These findings are important because they represent the first empirical test of augmented intelligence that focuses on AI’s assistive role in advancing human capabilities, rather than as an alternative to humans, which is how it is typically perceived.
At first glance, this seems like a trivial use-case. First, contacts that were sent emails in 2018 and 2019 need to be counted.Afterwards all those contacts will be considered 'lasped' if their emails bounced in 2020 or emails not opened in 2020. I had to extract the year because the date field containing email opened and email bounced were in string form so I had to use the len() function since dates can be in xx/xx/2020 or x/x/2020 or x/xx/2020 form.Here is the function substr('Date_Email_Opened',len('Date_Email_Opened') -3,4). Two fields were created-openedYr and bouncedYr. The left part of the snip below represents the raw data with derived fields from above added on right. In addition, 2 compute relative transformations were added to create 'bouncedIn2020' and 'openedIn2020' fields. After dataset creation, it becomes a simple cogroup in SAQL to bring out the different totals 1)# of contacts with emails sent in 2018-19 2)from that list, number of contacts where no emails were sent in 2020 or bounced in 2020. 3) % of lapsed = list2 / list 1
q = load "zds_processedYears"; q_B = filter q by 'OpenedYear' in ["2018", "2019"]; q_C = filter q by 'OpenedYear' in ["2018", "2019"] && ('openedIn2020' == "false" or 'bouncedIn2020' =="true"); result = group q by all full, q_B by all full, q_C by all; result = foreach result generate count(q) as 'RawData', unique(q_B.'Contact_Id') as 'EmailSent2yrs', unique(q_C.'Contact_Id') as 'SentButNo2020OrBounced2020', unique(q_C.'Contact_Id') / unique(q_B.'Contact_Id') as 'lapsedPercentage'; To calculate the elapsed # of days between a date field in an Einstein dataset and a string date field, use the daysBetween and toDate functions. The number multipliers have to do with epoch seconds.. Thanks for Pedro Gagliardi for his blog "Mastering Dates on Einstein Analytics using epochs". Here are the calculations.
1 hour = 3600 seconds 1 day = 86400 seconds 1 week = 604800 seconds 1 month (30.44 days) = 2629743 seconds 1 year (365.24 days) = 31556926 seconds Here is a compute expression transformation in a data flow which is of type number.: daysBetween( toDate('hdr2OLI.user2Oppty.CreatedDate_sec_epoch'),toDate(( ( string_to_number( substr("09/01/2020",1,2)) - 1 ) * 2629742 ) + ( ( string_to_number(substr("09/01/2020",4,2)) - 1) * 86400 ) + ( (string_to_number(substr("09/01/2020",7,4) )-1970) * 31556926) ) ) If you want to apply formatting to an Einstein dataset, you can go to dataset > edit dataset, then locate 'Extended Metadata File top right and download it. Paste it on an online JSON editor to make your life easier, then add formatting there. After saving it, go back to the edit dataset and Replace the json with the new one you just edited. Going forward, any dashboard,lens that uses this dataset will default to the settings you specified in the XMD. The code below takes the skeleton XMD and adds formatting on 2 fields GP$ and TotValue. The initial XMD had an empty list. The bolded code was added which turned those fields from something like 43,203.11 to $43203 in any future dashboards.
{"dataset":{},"dates":[],"derivedDimensions":[],"derivedMeasures":[],"dimensions":[],"measures":[ { "field": "TotValueUSD", "format": { "customFormat": "[\"$#,###,###\",1]" }, "label": "Tot Value (USD)", "showInExplorer": true },{ "field": "GPDollarsUSD", "format": { "customFormat": "[\"$#,###,###\",1]" }, "label": "GP$ (USD)", "showInExplorer": true}],"organizations":[],"showDetailsDefaultFields":[]} Great video by Salesforce about different model metrics in Einstein Discovery. Precision, recall, accuracy and the versatile F1 score are discussed.
A must view for aspiring data scientists. https://salesforce.vidyard.com/watch/5UpTbk6D24GdBzyZof2dWf?video_id=8801025 . Steps in creating a Salesforce Einstein external connector which enables you to connect to an external instance of Salesforce. It is a great way to achieve roll-up reporting or develop in a sandbox but have fresh data for Discovery development.
1) Analytics Studio > Data Manager >Connect 2)Create Connection and find appropriate connector-ie Heroku, , Azure… for this use-case pick Salesforce external connector. 3)Go to the destination instance and generate a securitytoken by clicking on your user profile and settings then ‘reset my security token’ 4) wait for the email, cut the security token (eg. 8ItzDv0matM8stuyuhT5uXXXs) 5)in the Setup your connection dialogue box, create a connection name/developer name/description and enter your username and password that you use to login to the destination instance. 6)append the security token to your password. So if your password is ‘mypassword’, that box should be ‘mypassword8ItzDv0matM8stuyuhT5uXXXs’ 7)click ‘save and test’ and hope to get ‘connection successful’ message. 8) Once the connector is up and running , you can add the objects that you need and their fields. Run the connector or schedule it to run periodically. 9) In a dataflow, create a digest node and pick the connector name and object from the list. I like the flexibility of the non-compact SAQL, however when it comes to having flexible groupings using bindings, I haven't had much success in using the non-compact form so below is the code for the aggregateflex compact form. Let's say you want a table that shows revenues but you want to create a toggle switch composed of 'region' , 'sales office' and 'country'. One day you want to see revenues by 'region' so table will have 2 columns (region and revenues), other times you want to see it by region and country--ie 3 columns > region , country and revenue.. and so on. this can be accomplised by creating a query, loading the dataset, creating groups, then tweaking the "grouping" by inserting a binding. Code below has 2 bindings- a string filter for fiscal period and grouping. BOTH are selection bindings. Be aware of the line "groups": "{{column(static_3.selection,[\"Valu\"]).asObject()}}",
When using the UI, groups will resolve to [], you need to take out the [] so it does not expect a list. { "query": { "measures": [ [ "sum", "GP_IOT__c" ] ], "groups": "{{column(static_3.selection,[\"Valu\"]).asObject()}}", "filters": [ [ "IoT_Fiscal_Period__c", [ "{{column(cFiscalPeriod_3.selection,[\"FiscalPeriod\"]).asObject()}}" ], "==" ] ] } }, { "query": { "measures": [ [ "count", "*" ] ], "formula": "B - A", "groups": "{{column(static_3.selection,[\"Valu\"]).asObject()}}" }, "header": "GP$ Delta" }, Table 1 below shows a Contact Object (or Table) in denormalized form. It has some redundant information such as email (ex.John's email needs to be stored 3x and Jane's 2x). Table 2 below shows the same information in a more efficient manner. Now there are 2 tables. The first table called Contact stores ID, contact and email, with the 2nd table storing Roles. The 2 tables are connected to each other by a 'KEY'. In this example, the ID field serves as the key. Normalized data is the main type of data structure used in modern databases today--primarily adopted to avoid redundancies, improve performance and minimize storage requirements.
However, there are some use-cases in Einstein Analytics that calls for denormalizing these database structures--primarily to speed up the queries and analytics that needs to be done to the data. As an example, an EA user might need information on role. In that case, user needs to 'munge' the contact object to the role object with the final table looking like table 1. In such a scenario, the augment node is used in an EA dataflow. The augment node has a left and right component with the left component containing the 'lowest grain' of the finished dataset. What is 'lowest grain'-- it is the most granular of the 2 tables. In above requirement, we need to see roles of people so the lowest grain needs to be the Roles table. More discussions later on the augment transformation and data flows. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
October 2020
Categories |