Training Module 13: Single Step Lookups

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 13, Single Step Lookups

Slide 2

Upon completion of this module, you should be able to:

  • Describe the Join Logic Table
  • Describe the advantages of SAFR’s lookup techniques
  • Read a Logic Table and Trace with lookups
  • Explain the Function Codes used in single step lookups
  • Debug a lookup not-found condition

Slide 3

Remember from Module 4 that SAFR allows users to combine or “join” data together from different files for selection tests or inclusion in an output file. Joins or Lookups require a Lookup Path to instruct SAFR which data from the two files to match, the source and the target. The fields in the lookup path are used in the Logic Table to build a key which is then used to search the target table for a match.  All lookups begin by using fields from the Source or Event LR or constants.

Slide 4

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 5

The Performance Engine flow changes when joins are performed.  GVBMR90 generates two logic tables, one for use in the Extract Phase, called the XLT for Extract Phase Logic Table. This logic table contains the views selected for processing from the SAFR meta data repository. We’ve analyzed XLT Logic Tables in prior modules.

When lookups are required, GVBMR90 also generates a Join Logic Table, called a JLT.  This logic table contains artificially generated views used to preprocess the reference data.  These views are never stored in the SAFR Metadata repository, and are regenerated every time GVBMR90 is run. 

In Join Processing, the Extract Engine is executed twice, once with the JLT and the second time with the XLT.  The Join Phase GVBMR95 reads the raw Reference Data files as its source data.  The outputs from the Join Phase are used in memory datasets by the Extract Phase to perform high speed joins. 

Slide 6

In this and following modules we’ll use a simple order example.  We’ll create a file that contains the Order ID, Customer E-mail Address, and Customer Gender Code. 

Available data includes

  • Customer’s orders are recorded on the Order file with an Order ID as the key. 
  • Customer data is recorded in a separate file, keyed by Customer ID. 

Each customer can make more than one Order, so the records in the Order file are more numerous than those in the customer file.  SAFR is designed to scan the more detailed file as the source event file and lookup keyed data in another file.  Thus we’ll define the Order file as the Source Event File, and the Customer file as a Reference File.  We’ll use these files to produce our a very simple output.

Slide 7

SAFR loads target join data into memory in order to increase efficiency.  Memory access is thousands of time faster than disk access.  However, memory is more expensive than disk.  Loading the entire source reference data, including fields that are not needed, into memory can be difficult or expensive.  The purpose of the JLT is to create a Core Image File, with only those fields needed for extract phase processing.  In our example, only the Customer ID, which is the key to the table, and the Customer E-mail Address and Customer Gender Code fields are needed in the core image file during Extract Phase processing.

Slide 8

When no joins are required by any view being executed, the MR90 Control Report will show an empty Join Logic Table.  When joins are required, the MR90 Control Report will contain a printed copy of the JLT. 

For our example, the JLT will contain two Extract Only type views, one to extract the reference data needed in a core image file for the Extract Phase and one to create a header record. These Extract Only views are generated by GVBMR90 at run-time-only.  They are never stored in the SAFR Metadata Repository.   In this example, JLT views, 25 and 26 temporary views.

The core image view has three DTE Logic Table Functions, (1) for the Customer ID field, (2) for the Customer E-mail Address and (3) for the Gender Code. It will also have some DTC and other functions to properly format the core image file.

Slide 9

In addition the core image files, which are formally named Reference Extract Data (RED) files, the second JLT view produces the Reference Extract Header file or REH.  The REH is used in the Extract Phase to allocate memory to load the Core Image Files into memory.

Each REH record contains counts of RED records and other information like the key length and offset, and total bytes required.  These records are optionally printed after the Join Phase by the GVBMR71.

These control records are produced by an additional view for each reference file in addition to the Extract Only view in the JLT.   This view contains some special Logic Table function codes that allow it to accumulate record counts.  It also only writes one record no matter how many input records are in the reference file.

Slide 10

Our example look up view reads the Order logical record as the source event file.  It has three columns, containing the:

  • Order ID from the source event file
  • Customer E-mail Address and
  • Customer Gender Code, both from the Customer file

Obtaining the Customer E-mail Address and Gender requires a join to the Customer logical record.  The join requires using the Customer ID on the Order logical record to search for the corresponding customer record in the Customer reference file.

As noted, the JLT will read the Customer reference file.  One view will create the core image RED file which will primarily have the Customer ID and Customer Email Address field data in it.  Another view will produce the REH record with a count of the number of records in the Customer core image file.

In the remainder of this module, we’ll focus exclusively on the Extract Logic Table, the XLT, for this view

Slide 11

The XLT contains the standard HD Header, RENX Read Next, NV New View functions as well as the concluding ES End of Source and EN End of Logic Table functions.

Additionally, it also contains a DTE Data from Event File field function.  This data is for column 1 of the view, indicated by the Sequence Number.  The source data starts at position 1 for a length of 10 bytes.  Under the Target section of the report, we can see that this data consumes 10 bytes in the target extract record.

Slide 12

Lookups always begin with a Join function code. The Join indicates where the Lookup Path information has been inserted into the view.  The steps required to perform the join are not coded in the view directly. 

The Sequence number following the Join indicates which column requires this join if the logic is column specific.  If the logic is general selection the sequence number will be blank.  The Join also indicates the ultimate Target LR for the join. 

The Join also includes Goto rows.  We’ll discuss these later in this module. 

In our example, The output from Column 2 is the Customer E-mail Address which requires a join to the customer file. Note that the NV indicates the source Event file LR is 10, the Order LR.  The join target is LR ID number 7, the Customer reference file.

Slide 13

A Join is almost always followed by one or more LKE functions.  An LKE function instructs the Extract Engine to build a Lookup Key from an Event File field. The LKE lists which LR the data should be taken from.  The Lookup Key, as opposed to the Extract record, is temporary memory containing a key used to search the core image file for a matching value. 

Note in our example that although the Join function Customer table Target LR ID is 7, the data for the key build should be taken from LR ID 10, the Order file Customer ID field.  This data is found on the source record at position 11 for a length of 10 bytes. 

Slide 14

The end of a Join typically has an LUSM.  The LUSM function does the binary search with the key against the Join’s target table in the RED Core Image file to find a matching record.  The function code stands for Lookup to Structure in Memory.

The GOTO rows for the LUSM function are Found and Not Found conditions.  Execution branches to GOTO Row1 if a reference table matching record is found for the LKx function built key.  Execution branches to GOTO Row2 if no matching record is found in the reference structure.

In this example, if a matching Customer record is found for the supplied Customer ID, execution will continue at LT Row 8.  If no customer is found, row 10 will be executed.

Slide 15

LUSM functions are typically followed by a set of functions similar in structure to the CFxx functions.  You’ll remember that if the comparison proves true, often a move Data from the Event field DTE or similar function follows, then a GOTO  function to skip over the ELSE condition.  The else condition is often a DTC, move Data from a Constant default value.

In this lookup example, if the LUSM finds a corresponding Customer ID in the Customer table the LUSM is followed by a DTL, move Data from a Looked-up field value.  If no matching Customer ID is found, a constant of spaces will be placed in the extract file by the DTC function.

Slide 16

The field Source start positions on DTE functions reflect where the data resides in the event file.  This is not true for the DTL functions, because the data is pulled from the RED. The data positions in the RED are not the same as the Reference Data Logical Record. The JLT process moves fields from the LR positions to the RED position, and these positions are determined by the order in which the views use the data.

In our example, the DTL function moves the 40 byte Customer Email Address on the Customer LR to the extract record from a starting position of 1 whereas on the Customer Record, it begins in Position 39. 

If the LR definition of the reference file is incorrect, the RED Key and all the data may be wrong.  At times it is necessary to inspect the RED file to see if the data exists.

Slide 17

Looked-up fields can be placed in columns in two ways.  They may be assigned directly, as shown in this view example and on this logic table.  Or they may be coded in Logic Text. 

If they are placed directed on the view editor and not in logic text, then SAFR automatically generates a default DTC in case of a not found condition. The default value depends upon the format of the target column:  Alphanumeric columns receive a default value of spaces, and numeric columns a default value of zeroes.

If the field is coded in Logic Text, the user can define a default ELSE condition, which would be assigned to the DTC value.

In our example, the default value is spaces supplied by the DTC function if the lookup fails and no customer Email Address is found.

Slide 18

Lookup paths are often reused within views, which means lookup logic is often repeated. 

Our view requires two look ups, one for the Customer Email Address for column 2, and a second for the Customer Gender Code in column 3. After the first lookup, the logic table then contains very similar logic for the second look up.  Except for changes in logic table and GOTO rows and Sequence (column) numbers the second set of Join, LKE and LUSM functions are identical to the first set.

The DTL, GOTO and DTC functions are also very similar between the two lookups, with same row and sequence number differences, yet their source field and target positions and lengths also differ.

Slide 19

The first goal of SAFR is efficiency.  Developers recognized that repeating LKx key build and the binary search LUSM function over and over again for the same key was inefficient, particularly when this lookup had just been performed and would have the same results.  SAFR has just found the Customer record with the Customer E-mail Address, and can use this same record to find the Customer Gender.  Thus SAFR has automatic Join Optimization built-in to every lookup. 

This is done through the Join function GOTO Rows.  Note that the GOTO Row 1 and Row 2 numbers on the Join Function are the same as the LUSM, the Found and Not Found rows.  Before the Join executes, it very efficiently test if this join has been performed for this record.  If it has, the JOIN immediately branches to either the Found or Not Found row based upon the last lookup.  This avoids using CPU cycles to do unnecessary work.

This optimization is not limited to repeat joins within a view, but operates across all joins for all views being run against the same source event file.  The greater the number of views and joins, the greater the efficiencies gained.

Slide 20

The last view specific logic table function is the WRDT, which Writes the Data section of the Extract record.  The record will include the DTE data for Order ID, the  DTL Customer Email Address or DTC spaces if there is not one, and the DTL Customer Gender Code or DTC spaces if that is missing.

Having examined the simple lookup logic table, let’s turn the Logic Table TRACE and examine the flow. 

Slide 21

For the first record from the Order File, rows 3 through 9 are executed in order. No rows are skipped.  (The HD and RENX functions of rows 1 and 2 are not shown in the trace). 

The Trace shows the DTE function moving the Order ID of 0000000001  from the Order Source Event File to the Extract record.

Slide 22

Because this is the first source event file record read, and the Join function on row 5 is the first join in the logic table, no join has been performed.  The Join performs a test to see if the join has been performed, and finding that it has not, the Join falls through to the LKE function to build the key for the join. No branching from the Join is performed.

Slide 23

Because this is the first Join, the LKE function is used to build a key to search for a matching customer record.  The trace shows the LKE value in the Customer ID field on Order file, which is moved to the lookup key buffer prior to doing the search.  The search LUSM function will search for customer number 1.

Slide 24

The results of the LUSM binary search can be seen by noting the next row executed.  In our example row 8 is the next row in the trace; from this it is clear that the LUSM function found a corresponding customer record with customer ID 1 in the Customer Reference File.  Now any Logic Table functions that reference Customer Logical Record ID 7 (called Record ID in the Trace Report) will use this record until another LUSM is performed for the Customer Logical Record.

Slide 25

The DTL Function moves looked up data from  the Customer Email Address field on the Customer Reference File to the Extract record.  The trace report shows a portion of the value that is moved to the extract record.

The GOTO function on Logic Table Row 9 then skips the DTC function which would have placed spaces in the Extract record if no Customer Record had been found. Remember that the trace only shows executed logic table rows.  Rows skipped are not printed in the trace.

Slide 26

The second Join is required for the Gender code.  It begins with the Join function testing to see if a join to Logical Record ID 7 has been performed previously.  Since the lookup for the Customer E-mail Address was just performed and resulted in a Found Condition, the LKE, LUSM functions do not need to be performed.  Rather, the GOTO Row 1 on the Join is used, just like the second LUSM had been performed and found Customer ID 1 again in the Customer record.  In this particular logic table, logic table rows 12 and 13 will likely never be executed during processing. 

Slide 27

The 2nd DTL Function moves the Customer Gender code from the Customer Reference File to the Extract record.  The trace report shows the “M” value that is moved to the extract record.  The GOTO function on Logic Table Row 16 then skips the DTC function which would have placed spaces in the Extract record if no Customer Record had been found.

The WRDT function writes this completed Extract Record to the Extract File.  Execution then resumes at the RENX function for the next Event Record.

Slide 28

Processing for the next four records continues following the same execution pattern as record number 1.  The only changes in the trace are the Event Record number, and the Values on the DTE, LKE, and both DTL functions.  The values from the Order and Customer files are moved to the Extract File.

Slide 29

The last two Orders in our example Source Event File do not have corresponding Customer records.  Thus on these two records, the first LUSM functions result in Not Found conditions, and branch to the DTC functions to populate the extract record Customer Email Address with the default value of spaces.  Likewise, the following Join functions detect that a Not Found occurred on the lookup, and also branch to the DTC functions to populate the extract record with a default space for the Customer Gender code.

We can manually perform the search function by taking the value shown in the LKE function, the “0000000000 “ and searching the Customer Reference File and find there is no Customer record for Customer ID “0000000000”.  There can be numerous causes for this, such as non-SAFR issues of mis-matches in the Order and Customer Reference files. 

If all lookups fail for a particular reference file it is likely a structural problem.  For example, the Join LR may be wrong.  Inspecting the RED can highlight this problem.  If data displayed in the LKE functions is incorrect, either the Event LR or the path may be incorrect.  Partially found lookups may indicate a data issue external to the SAFR configuration. 

Slide 30

The Extract Engine GVBMR95 control report shows the results of processing.  Twelve records were read from the input file, and twelve records written to the output file.  For each view against each event file, it reports the total lookups found and not found.  The process required 12 total lookups, 10 of which found corresponding records, and 2 which did not.

Note that these counts are the results of LUSM functions, and do not include Join optimizations.  If Join optimization had not been performed, the total number of lookups would have been 24, two look up for each extract record written times 12 extract records. 

Slide 31

There are many advantages to SAFR’s Lookups processes, allowing it to perform millions of joins or look-ups each CPU second.  The following are some reasons for this:

  • Memory based lookups are more efficient and rapid than lookups to data on a disk, but requires efficient memory usage
  • Join optimization and single-pass architecture allows reuse of lookups, including across different views
  • All reference data is loaded once, even when performing parallel processing; multiple threads share the same table in memory

Slide 32

This logic table and module has introduced the following Logic Table Function Code:

  • JOIN, Identifies join targets and test for executed Joins
  • LKE, builds a look up key using data from the source Event record
  • LUSM, binary searches of a in memory reference data table using the Lookup Key
  • DTL, moves a looked input field to the output buffer

Slide 33

This module described single step lookup processes. Now that you have completed this module, you should be able to:

  • Describe the Join Logic Table
  • Describe the advantages of SAFR’s lookup techniques
  • Read a Logic Table and Trace with lookups
  • Explain the Function Codes used in single step lookups
  • Debug a lookup not-found condition

Slide 34

Additional information about SAFR is available at the web addresses shown here. This concludes Module 13, Single Step Lookups

Slide 35

Training Module 4: Introduction to Lookups

The slides of this video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR. This is module 4: Introduction to Lookups.

Slide 2

This module provides you with an overview of SAFR lookup processing. Upon completion of this module, you should be able to:

  • Describe a typical SAFR lookup path
  • Create a lookup path in the SAFR Workbench
  • Include join data, using the lookup path in a view
  • Update and run the SAFR Join phase JCL
  • Interpret the results of join processing

Slide 3

SAFR allows users to combine or “join” data together from different files for selection tests or inclusion in an output. This requires telling SAFR which data from the two files to match, the source, and the target. These files typically have different layouts, described by differing SAFR logical records, or LRs. SAFR requires these source and target relationships to be predefined in a SAFR lookup path. The lookup path is similar to an SQL “where” clause in a relational database.

Slide 4

Lookup paths can be used by many different views. To view existing lookup paths, click Lookup Paths in the Navigator pane and open a lookup path in the metadata list at the bottom of the screen.

To create a new lookup path, select Administration, then New, and then Lookup Path from the menu.

Slide 5

The General tab of the selected lookup path shows the lookup path ID, name, comments, and status, either active or inactive.

When you create a new lookup path, the lookup path ID is assigned by the Workbench and is not editable. It is used to make each lookup path unique and appears along with the lookup path name in views.

Slide 6

The name of the lookup path is visible when you are using it in views. When you name a lookup path, it is good practice to include the source-LR-to-target-LR mapping performed by the path. Also, if more than one lookup path in your environment is used to join the same source LR and target, you should add additional qualifiers to the name to provide uniqueness.

Slide 7

Lookup paths are created with an Inactive status and remain inactive until they are completely defined. They can then be activated, allowing for use in views.

If an active lookup path changes, all views referencing that lookup path are deactivated to prevent views from performing inaccurate joins. Select the Make Lookup Path Inactive check box to deactivate the lookup; this also deactivates all the views using the lookup path.

Slide 8

From the General tab, we’ve clicked the Lookup Path Definition tab, where we can select the source logical record and the target LR and its specific logical file. We can then select the source fields that should be used to create a key to search the target to find a matching record. The accumulated length of the source fields must equal the total key length before the lookup path can be activated and used for processing in a view.

Slide 9

The Source Field Properties area is displayed when the Selected Source Fields list is clicked. Here you can select a source field from the source logical record or specify a constant that should be used in the key. Constants can be either static, as shown here, or symbolic, which allows setting the value in the view when the lookup path is used. When specifying a constant, carefully consider the format of the target key when setting the format (data type), length, and so on.

We’ve now completed the lookup path. Next, we’ll see how to use it in the view.

Slide 10

On this slide, we’ve switched to the View Editor, creating a view that reads a file that is described by the Order LR. In column 3, we require a field that is on the Customer LR. After inserting the column, we select Lookup Field as the source for the column.

Slide 11

Next, the Lookup LR box lists all the LRs that can be accessed from the order LR, based upon the lookup paths that have been created. Because we created the Order to Customer lookup path, the Customer LR is available in the list.

Slide 12

Next, we select the lookup path that we want to use to perform the join. This is necessary because multiple methods of joining from one LR to another might exist. For example, the Order LR might have a Purchasing Customer ID field and a Ship to Customer ID field, and either of these might be used as the basis for a different lookup path to find the customer address. In this instance, we’ve selected the lookup path that we have just created, with a lookup ID of 3.

Slide 13

Finally, we select the actual field we want to place on the extract file. All the fields on the Customer LR are available for use. In this instance we’ve selected the customer email address to be put out on the extract file.

Note that the Column Source Value field is populated with Customer[7].Order_to_Customer[3].Customer_Email_Address[70]. This is the LR, path, field names, and IDs. This pattern is used in numerous places in the Workbench.

Slide 14

We’ve used the lookup path in the view, resulting in a join. Looked-up fields can be placed in output columns, used as sort and aggregation fields, or used in filtering processes or calculations using similar steps.

We’ve now completed the view. Next, we’ll see how to run it in the Performance Engine.

Slide 15

Recall from module 2 that the first step in running the Performance Engine is running MR86, which takes in the View List and the SAFR Metadata Repository and creates an XML file of the view.

Slide 16

The next program is GVBMR84, the COMPILE, which produces the MR84 VDP. The addition of join processing makes no significant changes to this process.  

Slide 17

Next, MR90 is run. Because our view requires a lookup, two logic tables are produced. The Extract Logic Table (XLT) is used in the actual Extract phase as the view scans the Order file and performs joins to the Customer LR. The Join Logic Table, or JLT, on the other hand, simply prepares the Customer lookup file for use in the Extract phase.

The VDP tells MR90 that only the customer email address, the customer ID, and the join key are required for extract processing. The Join phase will use the JLT to extract only these two fields to be loaded into memory during the Extract phase processing to perform the joins.

Slide 18

The MR90 Control Report in the “Performance Engine Overview” module showed an empty JLT when no joins were performed in the view. Now the MR90 Control Report shows the JLT statistics and a printed version of the logic table.

Note that the view numbers listed in the JLT are run-time-only views for this execution of the Performance Engine. These example JLT views, 25 and 26, are never stored in the SAFR Metadata Repository

Slide 19

GVBMR95 produces two files in the Join phase. The RED file contains one row for every row in the input join file (in this example, the customer file), but only the fields required for extract processing. The REH file contains a single control record for each join file.

The following slides detail how to locate the DD name for the input join files for the Join phase GVBMR95 step and modify the JCL.

Slide 20

First, within the view, locate the lookup path used for any joins. For this example view, we are using the Order to Customer lookup path ID 3. The path name is highlighted on the slide.

Columns containing a joined field are highlighted with a Join icon to speed identification of required joins.

Slide 21

  • Next, click the Lookup Path icon in the Navigator pane. A list of all paths is shown at the bottom of the screen.
  • Select the path from the list. The Edit Lookup Path screen opens.
  • Click the Lookup Path Definition tab.
  • Note the target logical file defined for the lookup path. This value is used in the next step.

Slide 22

Next, locate the logical file by performing the following steps.

  • First, select Logical Files in the Navigator pane and then select the appropriate logical file from the list of logical records.
  • Next, find the associated physical file. The DD name is listed in the physical file definition, which we’ll find next.

Note that join files can be associated with only one physical file.

Slide 23

  • Next, click the Physical File icon and select the specific file from the Associated Physical Files list.
  • Last, locate the input DD name.

This DD name should be placed in the join file job for use by GVBMR95, as shown at the bottom of this slide. The DD name must point to the file containing customer data. Note that join files must be in sorted order by the key specified in the logical record. An error is issued in the Extract phase if the files are not in sorted order

Slide 24

After execution of the MR95 in the Join phase job, the MR95 Control Report shows the number of records read from the input Customer join file. It also shows how many records were written to the RED file (DD name GREF003) and the REH file (DD Name GREFREH). These files are then used in the Extract phase.

Slide 25

Following the Join phase execution of MR95, the Extract phase is run. Its inputs include the XLT, the REH and RED files, and the source or event file data (in this example, the Order file).

Slide 26

The Extract phase MR95 Control Report shows the processing of view 24 reading the Order file. Note that, on the same row showing how many records were extracted from the Order file (12), the control report also shows 10 lookups found (designated by the letter “F”) and 2 lookups not found (NF). This means two orders were associated with at least one customer ID on the Order file, which did not have corresponding customer records in the Customer file.

The report also shows that a total of 12 lookups were performed.

Slide 27

This module provided an introduction to lookups. Now that you have completed this module, you should be able to:

  • Describe a typical SAFR lookup path
  • Create a lookup path in the SAFR Workbench
  • Include join data, using the lookup path in a view
  • Update and run the SAFR Join phase JCL
  • Interpret the results of join processing

Slide 28

Additional information about SAFR is available at the web addresses shown here.

Slide 29

Slide 30

This concludes the Module 4. Thank you for your participation.