Training Module 22: using Pipes and Tokens

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 22, using Pipes and Tokens

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 3

This module covers three special logic table functions, the WRTK which writes a token, the RETK which reads the token and the ET function which signifies the end of a set of token views.  These functions are like the WR functions, the RENX function and the ES function for non-token views. 

Using SAFR Pipes does not involve special logic table functions.

Slide 4

This module gives an overview of the Pipe and Token features of SAFR.

SAFR views often either read data or pass data to other views, creating chains of processes, each view performing one portion of the overall process transformation

Pipes and Tokens are ways to improve the efficiency with which data is passed from one view to another

Pipes and Tokens are only useful to improve efficiency of SAFR processes; they do not perform any new data functions not available through other SAFR features.

Slide 5

A pipe is a virtual file passed in memory between two or more views. A pipe is defined as a Physical File of type Pipe in the SAFR Workbench.

Pipes save unnecessary input and output. If View 1 outputs a disk file and View 2 reads that file, then time is wasted for output from View 1 and input to View 2. This configuration would typically require two passes of SAFR, one processing view 1 and a second pass processing view 2.

If there is a pipe placed between View 1 and View 2, then the records stay in memory, and no time is wasted and both views are executed in this single pass.

The pipe consists of blocks of records in memory.

Slide 6

Similarly a token is a named memory area. The name is used for communication between two or more views.

Like a pipe, it allows passing data in memory between two or more views.

But unlike pipes which are virtual files and allow asynchronous processing, tokens operate one record at a time synchronously between the views.

Slide 7

Because Pipes simply substitute a virtual file for a physical file, views writing to a pipe are an independent thread from views reading the pipe.

Thus for pipes both threads are asynchronous

Tokens though pass data one record at a time between views.  The views writing the token and the views reading the token are in the same thread.

Thus for tokens there is only one thread, and both views run synchronously

Slide 8

An advantage of pipes is that they include parallelism, which can decrease the elapsed time needed to produce an output.  In this example,  View 2 runs in parallel to View 1. After View 1 has filled block 1 with records, View 2 can begin reading from block 1.  While View 2 is reading from block 1, View 1 is writing new records to block 2. This advantage is one form of parallelism in SAFR which improves performance. Without this, View 2 would have to wait until all of View 1 is complete.

Slide 9

Pipes can be used to multiply input records, creating multiple output records in the piped file, all of which will be read by the views reading the pipe.  This can be used to perform an allocation type process, where a value on a record is divided into multiple other records.

In this example, a single record in the input file is written by multiple views into the pipe.  Each of these records is then read by the second thread.

Slide 10

Multiple views can write tokens, but because tokens can only contain one record at a time, token reader views must be executed after each token write.  Otherwise the token would be overwritten by a subsequent token write view.

In this example, on the left a single view, View 1, writes a token, and views 2 and 3 read the token. View 4 has nothing to do with the token, reading the original input record like View 1.

On the right, both views 1 and 4 write to the token.  After each writes to the token, all views which read the token are called. 

In this way, tokens can be used to “explode” input event files, like pipes.

Slide 11

Both pipes and tokens can be used to conserve or concentrate computing capacity.  For example CPU intensive operations like look-ups, or lookup exits, can be performed one time in views that write to pipes or tokens, and the results added to the record that is passed onto to dependent views.  Thus the dependent, reading views will not have to perform these same functions.

In this example, the diagram on the left shows the same lookup is performed in three different views.  Using piping or tokens, the diagram on the right shows how this lookup may be performed once, the result of the lookup stored on the event file record and used by subsequent views, thus reducing the number of lookups performed in total.

Slide 12

A limitation of pipes is the loss of visibility to the original input record read from disk; the asynchronous execution of thread 1 and thread 2 means the records being processed in thread one is unpredictable.

There are instances where visibility by the view using the output from another view to the input record is important.  As will be discussed in a later module, the Common Key Buffering feature can at times allow for use of records related to the input record.  Also, the use of exits can be employed to preserve this type of visibility.  Token processing, because it is synchronous within the same thread, can provide this capability. 

In this example, on the top the token View 2 can still access the original input record from the source file if needed, whereas the Pipe View 2 on the bottom cannot.

Slide 13

Piping can be used in conjunction with Extract Time Summarization, because the pipe writing views process multiple records before passing the buffer to the pipe reading views.  Because tokens process one record at a time, they cannot use extract time summarization.  A token can never contain more than one record.

In the example on the left, View 1, using Extract Time Summarization, collapses four records with like sort keys from the input file before writing this single record to the Pipe Buffer.  Whereas Token processing on the right will pass each individual input record to all Token Reading views.

Note that to use Extract Time Summarization  with Piping, the pipe reading views must process Standard Extract Format records, rather than data written by a WRDT function; summarization only occurs in CT columns, which are only contained in Standard Extract Format Records. 

Slide 14

Workbench set-up for writing to, and reading from a pipe is relatively simple.  It begins with creating a pipe physical file and a view which will write to the pipe.

Begin by creating a physical file with a File Type of “Pipe”.

Create a view to write to the Pipe, and within the View Properties:

  • Select the Output Format of Flat File, Fixed-Length Fields,
  • Then select the created physical file as the output file

Slide 15

The next step is to create the LR for the Pipe Reader Views to use.

The LR used for views which read from the pipe must match exactly the column outputs of the View or Views which will write to the Pipe.  This includes data formats, positions, lengths and contents.

In this example, Column 1 of the View,  ”Order ID” is a 10 byte field, and begins at position 1.  In the Logical Record this value will be found in the Order_ID field, beginning at position 1 for 10 bytes.

Preliminary testing of the new Logical Record is suggested, making the view write to a physical file first, inspect the view output, and ensure the data matches the Pipe Logical Record, and then change the view to actually write to a pipe.  Looking at an actual output file to examine positions is easier than using the trace to detect if positions are correct as the Pipe only shows data referenced by a view, not the entire written record produced by the pipe writing view.

Slide 16

The view which reads the Pipe uses the Pipe Logical Record.  It also must read from a Logical File which contains the Pipe Physical File written to by the pipe writer view.

In this example, the view read the pipe-LR ID 38, and the Pipe LF ID 42 Logical File which contains the Physical File of a type Pipe written to by the pipe writing view.  The view itself uses all three of the fields on the Pipe Logical Record.

Slide 17

This is the logic table generated for the piping views.  The pipe logic table contains no special logic table functions.  The only connection between the ES sets is the shared physical file entity.

The first RENX – ES set is reading Logical File 12.  This input event file is on disk.  The pipe writing view, view number 73, writes extracted records to Physical File Partition 51. 

The next RENX – ES set reads Logical File 42.  This Logical File contains the same Physical File partition (51) written to by the prior view.  The Pipe Reader view is view 74.

Slide 18

This the the trace for piped views.  In the example on the left, the thread reading the event file, with a DD Name of ORDER001, begins processing input event records.  Because the pipe writer view, number 73, has no selection criteria, each record is written to the Pipe by the WRDT function.  All 12 records in this small input file are written to the Pipe.

When all 12 records have been processed, Thread 1 has reached the end of the input event file, it stops processing, and turns the pipe buffer over to Thread 2 for processing.

Thread 2, the Pipe Reader thread, then begins processing these 12 records. All 12 records are processed by the pipe reader view, view number 74.

As shown on the right, if the input event file had contained more records, enough to fill multiple pipe buffers, the trace still begins with Thread 1 processing, but after the first buffer is filled, Thread 2, the pipe reading view, begins processing.  From this point, the printed trace rows for Thread 1 and Thread 2 may be interspersed as both threads process records in parallel against differing pipe buffers.  This randomness is highlighted by each NV function against a new event file record

Slide 19

Workbench set-up for writing to, and reading from a token is very similar to the set-up for pipes. It begins with creating a physical file with a type of token, and a view which will write to the token.

Begin by creating a physical file with a File Type of “Token”.

Create a view to write to the token, and within the View Properties, select the Output Format of Flat File, Fixed-Length Fields.

Then in the proprieties for the Logical Record and Logical File to be read, select the output Logical File and created Physical File as the destination for the view

Slide 20

Like pipe usage, the next step is to create the Logical Record for the Token Reader Views to use. In our example, we have reused the same Logical Record used in the piping example.  Again, the logical record which describes the token must match the output from the view exactly, including data formats, positions, lengths and contents.

Slide 21

Again, the view which reads the Token uses this new Logical Record.  It also must read from a Logical File which contains the Token Physical File written to by the token writer view.

Note that the same Logical File must be used for both the token writer and token reader; using the same Physical File contained in two different Logical Files causes an error in processing. 

Slide 22

This is the logic table generated for the token views.

Any views which read a token are contained within an RETK – Read Next Token / ET – End of Token set at the top of the logic table.

The token writer view or views are listed below this set in the typical threads according to the files they read; the only change in these views is the WRTK logic table function, which Writes a Token shown at the bottom of the logic table in red.

Although the token reading views are listed at the top of the Logic Table, they are not executed first in the flow.  Rather, the regular threads process, and when they reach an WRTK Write Token logic table function, the views in the RETK thread are called and processed.  Thus the RETK – ET set is like a subroutine, executed at the end of the regular thread processing.

In this example, view 10462 is reading the token written by view 10461.  View 10462 is contained within the RETK – ET set at the top of the logic table.  View 10461 has a WRTK function to write the token.  After execution of this WRTK function, all views in the RETK – ET set are executed.

Slide 23

This is the Logic Table Trace for the Token Views.

Note that, unlike the Pipe Example which has multiple Event File DD Names within the Trace, because Tokens execute in one thread, the Event File DD Name is the same throughout, ORDER001 in this example.

The WRTK Write Token Function, on view 10461 in this example, which ends the standard thread processes are immediately followed by the Token Reading view or views, view 10462 in this example.  The only thing that makes these views look different from the views reading the input event file is that they are processing against a different Logical File and Logical Record, Logical File ID 2025  and Logical Record ID 38 in this example. 

Slide 24

The trace on the right has been modified to show the effect of having multiple token writing views executing at the same time. 

Note multiple executions of View 10462, the token reading view processing the same Event Record1,  from the Same Event DD Name ORDER001, after each token writing views, 10460 and 10461.

Slide 25

The GVBMR95 control report shows the execution results from running both the Pipe and Token views at the same time.

  • At the top of the report, the Input files listed include the
  • Disk File, the event file for both the pipe and token writing views
  • the Token (for the Token Reading views), and
  • the Pipe (for the Pipe Reading views)
  • The bottom of the report shows the final output disposition, showing records were written to
  • an Extract File (shared by the pipe and token reading views),
  • the Token (for the Token Writing View) and
  • the Pipe (for the Pipe Writing View)
  • The center section shows more details about each one of the Views.

Slide 26

The top of this slide shows the final run statistics about the run from the GVBMR95 control report.

Note that a common error message may be encountered if view selection for the pass is not carefully constructed, for either Pipes or Tokens.  Execution of a pass which contains only  Pipe Writers and no Pipe Readers will result in this message.  The opposite is also true, with no Pipe Writers and only Pipe Readers selected.  The same is also true for token processes.  Pairs of views must be executed for either process, matched by Logical Files containing matching Physical Files for the process to execute correctly.

Slide 27

This module described using piping and tokens. Now that you have completed this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 28

Additional information about SAFR is available at the web addresses shown here. This concludes Module 22, The SAFR Piping and Token Functions

Slide 29

Training Module 21: The SAFR Write Function

The slides used in the following are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 21, The SAFR Write Function

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for SAFR Write Function
  • Read a Logic Table and Trace with explicit Write Functions
  • Properly use Write Function parameters
  • Debug views containing Write Functions

Slide 3

This module covers the WR function codes in more detail.  These function codes were covered in training modules 11, 12, 17 and 18. 

Slide 4

Standard SAFR views always contain one of these four write functions, either (1) writing a copy of the input record, (2) the DT area of an extract record, (3) a standard extract record, or (4) a summarized extract record.  Typically views contain an implicit WR function in the Logic Table, almost always at the last function of the view.  This can be seen in the logic table on the left. 

In constructing more complex SAFR applications, at times greater control over what type of records are written, where, how many and when they are written is required.   The Write Function is a Logic Table verb which gives greater control over when WR functions are place in the Logic Table. The Write Function, inserted into a view as logic text, creates additional WR functions like the logic table shown on the right.

Let’s look at two examples of problems the Write Verb can help solve.

Slide 5

The Write Function can be used to “flatten” records that have embedded arrays, creating normalized records.  This is possible because for each record read by GVBMR95, multiple records can be written.  In this example, from Customer record 1, three records can be written, for products A, Q and C. 

Slide 6

Another typical use of the Write Verb is to split a file into multiple partitions.  Partitioning is important to be able to increase parallelism of the system in later steps. 

For example, suppose the day’s transactions are delivered in a single file.  Yet the SAFR environment has multiple partitions, a master file for each set of customers, perhaps by state, or for a range of customer IDs.  To be able to add the day’s transactions to customer master file, the daily file must be split into multiple files, one to be used in each master file update process.

To do this, the Write Verb is inserted into Logic Text “If” statements, testing the Customer ID and writing the record to the appropriate file.

The Write Verb and associated parameters control all aspects of writing records, including specifying which record type to write, which file (DD Name) to write it to, and if a Write User Exit should be called.  Let’s first review how these parameters are controlled on standard views, when the Write Verb is not used. 

Slide 7

The following rules determine which Logic Table Function is generated when a view contains no Write Verb and calls no Write Exit: 

  • Views which use the Format Phase, either for File Format Output or for Hard Copy Reports, contain an implicit WRXT Functions to write standard extract records. 
  • Views which use Extract Time Summarization, which requires standard extract file format records, contain an implicit WRSU to write summarized extract records.
  • File output views which DO NOT use the Format Phase are Extract Only views.  Views which are Copy Input views—specified by selecting “Source-Record Structure”—contain WRIN Functions. All others contain WRDT functions to write the DT Data area of the extract record.

Almost all use of the Write Verb is for extract only views—views which do not use the Format Phase.  Because the WRXT and WRSU functions write standard extract file records (which are typically processed in the Format Phase), they are rarely used in conjunction with the Write Verb. 

Slide 8

The DD Name used for Extract Only views can be specified three ways, as reviewed on this slide, and the next. 

If an output physical file is specified in the View Properties, the Output DD Name associated with the physical file will be used.  In this example, the output physical file is OUTPUT01.  If we find the DD name for physical file OUTPUT01 by selecting Physical Files from the Navigator pane opening the OUTPUT01 physical file shows the output DD name of OUTPUT01.

Slide 9

If the physical file value is blank, a new DD name will be generated at runtime.  The most common generated name is the letter “F” followed by the last seven digits of the view number, using leading zeros if required.   In this left hand example, the DD Name is “F0000099”.

Alternatively, an Extract File Suffix can be used.  If the Extract File Number is assigned a value, the Extract Engine concatenates the assigned number to the value “EXTR” with three digits representing the Extract Work File Number, beginning with zeros if necessary to form the DD Name to be used.  In the right hand example, the extract records will be written to the file assigned to the DD Name EXTR001.

Slide 10

The last parameter the Write Verb controls is whether a Write Exit is called.  When the Write Verb is not used, write exits are assigned in view properties, in the Extract Phase tab.  The write exit is called as part of the implicit WR function generated for the view.

Slide 11

These are the elements of the Write Function syntax.  The Write verb keyword can be followed by three parameters which specify:

  • Source
  • Destination
  • User Exit

The source parameter can be used to specify three possible sources:

  • Input data, which means the view is a Copy Input view, the extract file containing a copy of the input record
  • Data, which means the DT area data is written to the output file
  • View, which means the standard view extract record is written to the output file.

The destination parameter allows modification of the target for the write:

  • The Extract parameter allows specifying the DD Name suffix of the extract file
  • The File parameter allows associating the write to a specific logical file

The user exit parameter allows specifying a user exit to be used

Slide 12

The following are examples of Write Functions.  Each of these example statements first tests field 4 to see if it is numeric.  If it is, then data will be written in some way:

  • In example 1, the input data will be copied to an assigned Copy Input (Source=Input) Physical File associated with a specific Logical File (Destination=File)
  • In example 2, the write verb causes a call to the DB2_Update User Exit (Userexit=DB2_Update).  It passes the exit the DT area data of the view. (Source=Data)
  • In example 3, the write verb writes a Standard Extract Record (Source=View) to DD Name EXTR003 (Destination=Extension=03)

Slide 13

The Write Verb can be used in either Record Level Filtering Logic, or Column Assignment Logic.  When a Write Verb is used in Record Filtering, the Select and Skip verbs cannot be used.  The Write Verb replaces the need for these keywords.

The logic shown here is Record Level Filtering logic, and could be used to partition a file containing multiple customers into individual files.  Record level filtering requires the use of the Source=Input parameter is used.  This parameter copies input records to the extract file.  This is required because no columns are created prior to executing Record Level Filtering logic.  The only valid output is to copy the input record.

In this example, the logic in the record filter first tests the Customer ID.  If it is “Cust1,” then the record is copied to the DD Name EXTR001 because of the Destination=01 parameter.  The second If statement will copy Customer 4 records to the EXTR004.  Because the logic is mutually exclusive, (the Customer ID can only be either Cust1 or Cust4, not both), the total output records will equal the input records, except they will be in two different files.

Slide 14

Column Assignment Logic can use the Source=Input parameter if desired.  More frequently, it uses the Source=Data parameter.  This parameter writes all DT columns that have been built in the extract record to that point.  Thus logic can be used to select particular fields from the event file, and write a fixed length record to the output extract file.

Remember that the logic table creates fields in order by column numbers, from left to right.  This affects which column should contain the Write Verb. With a Write Verb in multiple columns, multiple records, of different lengths, will likely be written to the extract file. 

This example attempts to break a record with an array into multiple individual records.  The Write Verb in each column causes an extract record to be written with the DT data built to that point in the Logic Table.  As each field is added, the record become longer.  This is not typically a desired SAFR output.

Slide 15

Alternatively, the Write Verb is often placed only in the last column of the view, thus acting upon the extract record after all columns have been built.  In order to change pervious columns, the logic text can reference those columns specifically.  Thus values in prior columns can be manipulated within the last column, prior to use of the Write Verb.

This example again attempts to break the array into multiple records.  Because logic text can refer to previous columns, logic can be placed in column four which changes the value of column 3.  Thus both the Product ID and the Amount fields are set in column 4, and then a record is written.  Then these values are overwritten by the next Product ID and Amount combination, and another record is written.  Finally the last two fields are moved, and the last record written.  This is a more common design of a view containing multiple Write Verbs.

Slide 16

Let’s take a moment and contrast the prior view with multiple write statements in one view, with using multiple views.  Because each view contains its own implicit Write function, by writing multiple views we could break apart the array.  To do so, the first view would reference the 1st array fields, Product and Amount 1; the second view the 2nd set of fields, and the third the 3rd set of fields.  Thus by reading one event file record, three extract records would be written.

The downside of this approach is that the logic to populate fields 1 and 2, the Customer ID and Tran Date, would be replicated in all three views; changes to these fields would need to be made in three places. 

Slide 17

Using the Write Function keywords, any of the WR functions can be generated in a logic table.

In these example logic tables, a standard extract only view without a Write Verb is shown on the left.  This view has one WR function, as the last function in the logic table.

The logic table on the right shows the write verb, with Source=Data, has been coded multiple times in logic text. 

Note that the sequence number is 001, meaning all these write verbs are contained in the logic text of column 1.  Each of the logic table functions immediately preceding these WRDT functions may modify the extracted data during the Extract Phase.  Each will result in a different extract record being written to the extract file.

Slide 18

This module described SAFR Write Function. Now that you have completed this module, you should be able to:

  • Describe uses for SAFR Write Function
  • Read a Logic Table and Trace with explicit Write Functions
  • Properly use Write Function parameters
  • Debug views containing Write Functions

Slide 19

Additional information about SAFR is available at the web addresses shown here. This concludes Module 21, The SAFR Write Function

Slide 20

Training Module 20: Overview of SAFR User-Exit Routines

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 20, Overview of SAFR User-Exit Routines

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for SAFR user-exit routines
  • Read a Logic Table and Trace which use exits
  • Explain the Function Codes used in the example
  • Debug views using exits

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.  These include the RENX, LUEX and WREX functions.

Slide 4

Let’s briefly review the SAFR Performance Engine processes.

Slide 5

SAFR begins by developers creating metadata describing records, files, fields, relationships between files and fields, and user exits which might be called to perform processing in the workbench.  They then create views, which specify the logic to be applied to the data through the scan processes.  The metadata and the views are stored in the SAFR View and Metadata repository, awaiting the start the Performance Engine.

Slide 6

The first parts of the Performance engine performs the Select, Logic and Reference File phases.  The Select Phase selects the views and associated metadata for those views from the View and Metadata Repository.  Alternatively, this phase can read data provided in a SAFR XML schema, which defines the metadata and functions to be performed against the data.  It bundles either of these into a VDP, View Definition Parameter file.

The VDP is used in the Logic Phase to produce two Logic Tables, one for the Reference File Phase, and one for the Extract Phase.  The logic tables contain the functions codes described in this and the prior modules. 

The Extract Phase Logic Table file is used to process event files, but before that the Reference File Logic Table is used to extract a core image file from the reference files.  The Reference File Logic Table does not contain any selection logic, thus it does not remove any rows of data.  Rather, the core image file contains only those fields (columns) needed to be loaded into memory for joins, plus the keys required for the join processes and any effective dates.

Slide 7

The Extract Phase begins by loading the VDP, Extract Phase Logic Table and Reference Data from disk.  It then uses the logic table to generate machine code for each thread, each input file partition. It then opens input and output files, and executes these threads according to the thread governor, in parallel.  Each event file record is read and processed through the logic table, performing joins, and writing selected data to output extract files.

The Extract Phase includes multiple types of user exits, including read, write and look-up exits, highlighted on this graphic by the green balls.

Slide 8

The format phase, which is optional depending on the view requirements, begins by sorting each extract file using a custom generated sort card based upon the sort criteria of the views written to that extract file.  During the final phase of sort processing—that which writes data to disk—the records are passed in memory to the format engine.  These records are formatted according to the VDP specifications.  The data is written to the final output file.

The Format Phase includes the Format Exit, shown here at the end of the Format process.  Sort utilities also allow creating sort input exits, shown here over the Sort process.

Let’s overview each of these users exits.

Slide 9

User exits, which can also be thought of as API points, are custom programs to perform functions SAFR does not do natively.  For example, SAFR does not perform an exponential mathematical calculation.  A customer could create a program to perform this function, and call it from SAFR to return the results of the calculation.

As noted, SAFR has four major points which can invoke a user exit, read, lookup, write and format exits.  The first three are Extract Phase exits, and are used much more frequently than the fourth Format Phase exit.  They are:

  • Read Exits stand between the actual input event file and the SAFR views.  These exits can modify input records to be presented to SAFR threads for processing.
  • Lookup Exits stand between SAFR views and the look-up data loaded into memory.   Lookup exits accept join parameters and return looked up records in response to individual joins. These exits can also be used as simple function calls which do not actually perform any look-up.  For example the exponential calculation discussed above could be written as a look-up exit.
  • Write Exits stand between SAFR and the extract files.  They receive extract records and can manipulate them before being written to extract files. 
  • Format Exits, the only GVBMR88 exit, accepts summarized and formatted Format Phase output records prior to being written to files. Format exits are very similar to write exits, except that the record used is the final output record, rather than the extract record. 

At the end of this module we will touch on a non-SAFR exit, the Sort Input Exits.

Slide 10

As noted, read exits sit between the extract engine, and the input event file.  The SAFR views, inside GVBMR95, are only aware of the virtual file which is output by the read exit.  Read Exits are assigned to a physical file, and are placed on the RENX logic table function.  It is called each time the RENX function is executed. 

Examples of read exits that have been written for SAFR applications include:

  • Read a specialized database structure and extract every record to be passed to SAFR to allow SAFR to produce reports against those records
  • Merge multiple sequential files and compare snap shot records with the history file into a single master file, then used to produce reports
  • Process sets of records, and perform functions across the entire set, where one record can affect other either later, or earlier records.  SAFR does not easily perform these functions in view.  After all affects are determined and applied, the file is passed to SAFR for report generation.

Read exits are typically the most complex to write, because they must perform some IO of some kind.  They are further complicated because it is very inefficient to call a read exit for each record; so instead they are usually written to do block level processing. 

Slide 11

Look-up exits sit between the extract engine, and a potential look-up file. The SAFR views are only aware of the virtual lookup record output by the lookup exit.  A sample application could be doing direct reads to a database table to retrieve a join value for processing.  However, most often look-up exits do not actually load any data from disk; rather they simply use input parameters passed to them by the views to do some function.  Thus the exit is basically a simple function call. 

Look-up exits are the easiest type of exit to create. The parameters passed to the lookup exit are the values placed in the fields of the join key.  These can be constants, fields from the event file, or fields from another lookup, including calls to other exits.  The output from the lookup exit is a record that must match the LR for the “reference file” record it is to return.   Although it appears to a SAFR developer as if SAFR has taken the keys and performed a search of a reference table to find the appropriate record, the exit may have done no such thing.  In fact, it could do something as simple as reordering the fields passed to it and returning the record.

Lookup Exits are assigned to a target look-up LR.  When used, the typical LUSM functions are changed to LUEX functions.   The exit is called each time the LUEX function is executed. 

Certain modules are delivered with the product, called SAFR Standard Look-up Exits.  These perform a common set of functions not done by native SAFR.   For example:

  • GVBXLEXP      An exponential calculator.  If passed a number, format and exponent, it will perform the calculation
  • GVBXLMD       Accepts two dates, and computes days between those dates
  • GVBXLRUD     Simply returns the SAFR Fiscal Year data, passed in through the VDP
  • GVBXLST         Accepts strings and concatenates them and returns a single string value
  • GVBXLTM        Performs a trim function against passed string parameters

Slide 12

Write exits sit between the extract engine, and a potential extract or output file. Each write exit is tightly coupled to it’s SAFR view, because the exit receives the view output.  Extract exits are called whenever a view is to write an extract record.  In addition to the view’s extract record the write exit is also passed and can sees the event record. 

Write Exits are associated with a view, since the view or a write statement within the view generates the WREX logic table function (rather than WRIN, WRDT, or WRSU functions) and creates the extract record

The write exit can tell GVBMR95 to do any one of the following:

  • Tell GVBMR95 to write a record the exit specifies (could be any record) and continue with event file processing
  • Tell GVBMR95 to skip this extract record and go on
  • Tell GVBMR95 to write a record the exit specifies (could be any record) and return to the exit to do more processing

The exit can manipulate the extract record; substitute a new record, table the extract record in some way and then dump the table at the end of event file processing, or any other number of things.  Note, though, that unlike read exits which do open and actually read files, write exits typically do not.  They return records to SAFR to write to the extract file.  They could do their own IO, but there is typically no benefit to doing so.  SAFR’s write routines are very efficient.

Some examples of write exits include the following:

  • Table multiple records, summarize them if they have common sort keys, and write a record when the key changes.
  • Read a set of parameters giving scoring requirements, table multiple records and upon a key change, score records.  Create a completely new record with the scoring results, write this new record, and drop all the tabled records

Write exits are in between read and lookup exits for complexity.  This is mainly because of the complexity of dealing with extract records.  The exit must know what the extract record will look like for a particular view.  This might be easiest to determine by actually writing the view, and inspecting the extracted records to find positions and lengths.  Any changes in the views can create a new to update the write exit. 

Format exits are very much like write exits, except that they are called at the end of GVBMR88.  They are also dependent upon the view specification.  They are called for each format output record.  Like write exits, they are specified on each view.  They are specified only in the VDP; there is no logic table impact for them.

Slide 13

Using exits requires that they first be described in the User Exit Routine screens within the Workbench.  The name can be anything desired. The type can be either Read, Look-up, Write, or Format.  The language and path are for documentation only.  The executable must match the load module name stored within an accessible loadlib for either GVBMR95 or GVBMR88.

Slide 14

The Optimizable flag is only applicable for look-up exits.  Remember that SAFR bypasses certain look-ups when the look-up has already been performed.  In these cases, the look-up exit would not be called in the subsequent cases.  If the look-up exit is stateless, in other words, it does not function differently from one execution to another given the same input parameters, the exit can be optimized.  If the exit retains its state from one call to another, then it must be called each time and cannot be optimized. 

For example, one exit was written to detect the first time it was called for a particular event file record.  In this case, it would return a return code of 0; every subsequent call would return a code of 1.  This exit cannot be optimized; each potential call must actually call the exit.  Otherwise the exit would always return a code of 0.

Slide 15

Once the user exits are entered into the User Exit table, they can be assigned to the other appropriate metadata components.  For Read Exits, the exit must be assigned to specific Physical File entities.  Remember that for each physical file read by views being processed, the Logic Table contains an RENX logic table row for that physical file.  By assigning a read exit on the physical file, the generated RENX entry will contain the exit to be called each time a new record or block of data is needed

The data returned by the read exit must match the Logical Records that are assigned to this physical file.  Thus when a view accesses a field to perform, perhaps, a selection function, the data must match the logical record layout for the SAFR “physical” file entity, not the file read by the read exit.

Slide 16

Standard look-ups require the data to be joined to be defined by an LR.  For look-up exits, the logical record to be returned by the exit must match a specific LR (not the data read from file by the look-up exit, if there is any).  In the example shown here, the “phase code” value must be returned by the look-up exit in position 3 for a length of 2.

When defining the logical record to be returned by the exit, define any of the input parameters the exit will require as “keys,” as if the exit were going to search a reference file table to find the required answer.

Next define a path that will provide the needed inputs to the exit.  The values in the path can be provided as constants in the views, or in the path, or as values passed from the input file or looked up from another look-up table, requiring a multi-level look-up.

Slide 17

After defining the LR and the look-up path, to assign a lookup exit to the Logical record, select the LR Properties tab.  Then select the appropriate exit.  When a field from the target LR is used in a view, this exit will be called to return data in the format of the defined logical record.  Thus when the Logic Table is generated, the LUSM will be changed to LUEX.  This logic table function contains the user module name to be called.

Slide 18

Write Exits and Format Exits are assigned in view properties.  Write exits are assigned in the Extract Phase tab; Format Exit are assigned Format Phase tab. 

Because both of these exits sit potentially at the end of the process, these exits do not return data to SAFR views; therefore no logical record defines the outputs from these exits.  Rather the view columns and format (file, hardcopy, etc.) define what these exits will receive.  Changes to the view layout will affect the exits.

Often the “Write DT Area” option is used with write exits, to eliminate the complexity of the extract record layout.  Only the column data is passed to the write exit.

Slide 19

Each exit has the potential to receive a fixed set of parameters upon start up.  These parameters are assigned for each instance that an exit is invoked. 

For example, a look-up exit may function differently depending on which LR it is supposed to return; perhaps data can be returned in compressed format in one instance, and not in another.  The LR for the compressed data may pass in a start-up parameter of “CMPSD” and the uncompressed LR would pass in a start-up parameter of “UNCMPSD”.  In this way the same exit program can be used, and which LR should be returned can be indicated as a parameter to the exit. 

Slide 20

Read exits receive only the start-up parameters.  Write and Format exits also receive start-up parameters; they also receive the extracted record from the view.  Write exits also have visibility to the event file record as well.

Look-up exits, by contrast, receive start-up parameters; they also have visibility to the event file record.  Additionally, Look-up exits receive all the parameters built in the look-up path.  These values must match the required key for the logical record.

Note the difference between these two types of parameters:

  • The start-up parameters do not change throughout the entire run of SAFR.  They are constant.  They are typically only used by the exit at start up to determine which mode the program should function in.
  • The look-up key values can change based upon every event file record processed.  Customer ID 1 on the first record may become Customer ID 10,000 on the next record. 

Slide 21

We will use this view to show the GVBMR95 Logic Table and MR95 Trace.  Our example will use a look-up exit, which returns various thread parameters that can be of interest for technical reasons in a view.  The look-up exit is assigned on the LR and path we just examined.

Slide 22

This is the logic table for the example view.  Note that lookups, which would normally have the function code of LUSM, have been changed to the function code of LUEX.  Also, for each LUEX, the ID associated with the User Exit is assigned, in this case, ID 13.  This is the ID assigned to the module GVBXLENV. 

Our path required a single character value be passed to the exit.  This “key” value—a constant of “D”—built by the LKC function, will be passed to the exit as part of the lookup.

Note also that the view has no write exit, because the logic table is WRXT, not a WREX.  Also note there is no Exit ID assigned to the WRXT row.

Slide 23

This is the GVBMR95 Trace for the logic table. 

For event file record 1, the LKC function builds a key with a value “D” in it.  This value is passed to the exit GVBXLENV during the LUEX function.  The exit is called.  The view then uses the data, through a DTL function, placing a value 0001 into column 2.  GVBXLENV did not search any data.  It simply queried the SAFR thread number to return a value of 0001. 

Also, note the number of times the exit is called in this trace.  Each lookup actually required calling the exit.

Slide 24

This is the same view but a new trace.  In this trace the “Optimize” flag was turned on.  This means that the logic table only has one executed LUEX function for the entire first event file record.  Because the LKC value of “D” did not change between calls to the exit, the exit is not called again. 

Slide 25

The difference between the logic tables for optimized and non-optimized is very clear. The Optimized trace on the right saves significant CPU time, including the overhead for linking to the user exit multiple times on each event file record.  Exits require CPU time by their nature, and the efficiency of the language run-time can also have an impact.  Efficiency should carefully be considered when creating any exit.

Slide 26

The trace shows the values placed in the key by the LKDT, LKC and other look-up key functions, the value of “D” in our example view.  These parameters are passed to the look-up exit.  It also shows the static parameters passed as well.  These are shown on the end of the LKEX function row, after the Look-up Exit module.  Similar parameters can be seen in the trace for Read and Write exits if start-up parameters are passed.

Slide 27

Exits must be written following the SAFR User Exit guidelines.   These specify a standard set of linkage parameters to interact with GVBMR95 and GVBMR88.  They include:

  • A standard set of pointers, used to access various data provided by SAFR, including the event record, the extract record, look-up keys, and a work area for maintaining working storage parameters for the exit. 
  • Environment data, including the Phase Code, Open, Read, or Close, informing the exit what the status of processing is.  Exits are called during the
  • Open phase, to prepare for processing,
  • Read phase as event file records are processed, and
  • Close phase, to print out control reports, flush final records, or clean up.
  • Return codes values, informing SAFR of the results of processing.  These can include:
  • A Found, Not Found, or Skip Event Record condition on a Look-up Exit,
  • An End of File on for a Read exit, or
  • Write the standard extract record, Write a different record and then return to the exit for processing, or Skip the extract record and continue processing for Write Exits
  • All Exits may signal view or process level errors as well

Slide 28

As noted earlier this graphic shows the potential for a Sort Utility Input or Read Exit.  SAFR uses standard sort utilities provided with the operating system or otherwise.  These utilities typically allow for a read exit to the sort utility, a program which stands between the sort utility and the file to be sorted.  Procedures for writing these exits depend upon the sort utility used.  Refer to the sort utility documentation for instructions.

As an example of this type of exit, SAFR provides a module GVBSR02.  This module accepts a parameter file, instructing it which view(s) to create multiple permutations for.  Before the utility sorts the records, the Sort Exit will replicate the extracted data, creating new records with permuted sort keys.  For example, a record with a sort key of A, B and C will be duplicated, and a records for of the following would be created:

  • B, C, A,
  • C, A, B,
  • C, B, A
  • A, C, B
  • B, A, C

This produces many more possible outputs without creating all the different views or exploding the extract files with all the possible combination. 

Using this exit requires concatenating special sort cards to the GVBMR95 generated sort cards, and creating parameters for the Logic Table programs instructing them to generate permuted VDP views, similar to the process it undertakes when it creates the JLT for the reference file phase. These views are never found in the Metadata Repository; they are temporary views only in that run’s VDP.  They are required in the VDP for GVBMR88 to refer to in processing the permuted records.

Slide 29

In this module, we examined the following logic table functions:

  • REEX, Read a new Event Record with a user exit
  • LUEX, Lookup calling a user exit
  • WREX, Write calling a user exit

Slide 30

This module described SAFR User Exit Routines. Now that you have completed this module, you should be able to:

  • Describe uses for SAFR user-exit routines
  • Read a Logic Table and Trace which use exits
  • Explain the Function Codes used in the example
  • Debug views using exits

Additional information about SAFR is available at the web addresses shown here. This concludes Module 20, Overview of SAFR User-Exit Routines

Training Module 19: Parallel Processing

The slides for the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 19, Parallel Processing

Slide 2

Upon completion of this module, you should be able to:

  • Describe SAFR’s Parallel Processing feature
  • Read a Logic Table and Trace with multiple sources
  • Set Trace parameters to filter out unneeded rows
  • Configure shared extract files across parallel threads
  • Set thread governors to conserve system resources
  • Debug a SAFR execution with multiple sources

Slide 3

In a prior module we noted that the Extract Engine, GVBMR95, is a parallel processing engine, the Format Phase, GVBMR88, is not.  It is possible to run multiple Format Phase jobs in parallel, one processing each extract file.  Yet each job is independent of each other.  They share no resources, such as look-up tables for join processing.  In this module we will highlight the benefits of GVBMR95 multi-thread parallelism, how to enable, control, and debug it.

Slide 4

Parallelism allows for use of more resources to complete a task in a shorter period of time.  For example if the view requires reading 1 million records to produce the appropriate output and the computer can process 250,000 records a second, it will require 4 seconds at a minimum to produce this view.  If the file is divided into 4 parts then the output could be produced in 1 second.

Doing so requires adequate system resources, in the form of I/O Channels, CPUs, and memory.  If for example, our computer only has 1 CPU, then each file will effectively be processed serially, no matter how many files we have.  All parallel processing is resource constrained in some way.

Slide 5

GVBMR95 is a multi-task parallel processing engine.  It does not require multiple jobs to create parallelism.  Using the Logic Table and VDP, GVBMR95 actually creates individual programs from scratch and then asks the operating system to execute each program, called sub tasks, one for each input Event File to be scanned for the required views.  This is done all within one z/OS job step.  Each of these sub tasks independently reads data from the Event File, and writes extract records to extract files for all views reading that event file.

In this example, the combination of views contained in the VDP and Logic Table require reading data from four different event files.  These views write output data to six different extract files.  The main GVBMR95 program will generate four different sub tasks, corresponding to the four different input event files to be read.

Slide 6

Multi-threaded parallelism can provide certain benefits over Multi-job parallelism.  These benefits include:

  • Shared Memory.  Only one copy of the reference data to be joined to must be loaded into memory.  Sharing memory across different jobs is much less efficient than within a single job.  Thus all sub tasks can efficiently access the memory resident tables
  • Shared I/O.  Data from multiple input files can be written to a single extract file for that view, allowing for shared output files
  • Shared Control.  Only one job needs to be controlled and monitored, since all work occurs under this single job step
  • Shared Processing.  In certain cases, data may be shared between threads when required, through the use of SAFR Piping or exit processing

Slide 7

Recall from an earlier lesson that:

  • A physical file definition describes a data source like the Order Files
  • A logical file definition describes a collection of one or more physical files
  • Views specify which Logical Files to read

The degree of potential parallelism is determined by the number of Physical Files.  GVBMR95 generates subtasks to read each Physical File included in every Logical File read by any View included in the VDP.

Slide 8

The SAFR metadata component Physical File contains the DD Name to be used to read the input event file.  This DD Name is used throughout the GVBMR95 trace process to identify the thread being processed.

Slide 9

Remember that GVBMR95 can also perform dynamic allocation of input files.  This means that even if a file is NOT listed in the JCL, if the file name is listed in the SAFR Physical File entity, GVBMR95 will instruct the operating system to allocate this named file.  GVBMR95 will first test if the file is listed in the JCL before performing dynamic allocation.  Thus the SAFR Physical Entity file name can be over-ridden in the JCL if required.  Dynamic allocation can allow an unexpected view included in the VDP to successfully run even though no updates were made to the JCL. 

Slide 10

The same Physical File can be included in multiple logical files.  This allows the SAFR developer to embed in file’s meaning that may be useful for view processing, like a partition for stores within each state.  Another Logical File can be created which includes all states in a particular region, and another to include all states in the US.

In this module’s example, we have the ORDER001 Logical File reading the Order_001 physical file, the ORDER005 Logical File reading the ORDER-005 physical File, and the ORDERALL Logical File, reading the ORDER001, 002, 003, 004 and 005 physical files.

Slide 11

In this module we have three sample views, each Extract Only Views with the same column definitions (3 DT columns).  The only difference is the Logical File each is reading.  View 148 reads only the ORDER 001 file, while view 147 reads only the ORDER 005 file, and view 19 reads all Order files, including 1, 2, 3, 4 and 5.

Slide 12

In the first run, each view in our example writes its output to its own extract file.  Output files are selected in the view properties tab.  The SAFR Physical File meta data component lists the output DD Name for each, similar to the Input Event Files. 

Slide 13

As GVBMR90 builds the logic table, it copies each view into the “ES Set” which reads a unique combination of source files.  Doing so creates the template needed by GVBMR95 to generate the individualized program needed to read each source. Note that Output files are typically a single file for an entire view, and thus typically are shared across ES sets.

In our example, three ES Sets are created.  The first includes views 148 and 19, each with their NV, DTEs, and WRDT functions.  The second ES Set is only view 19.  And the third is views 19 and 147.  The Output04 file will be written to by multiple ES Sets, which contain view 19.

Slide 14

This is the generated logic table for our sample views.  It contains only one HD Header Function, and one EN End function at the end.  But unlike any of the pervious Logic Tables, it has multiple RENX Read Next functions, and ES End of Set Functions.  Each RENX is preceded by a File ID row, with a generated File ID for that unique combination of files which are to be read.  Between the RENX and the ES is the logic for each view; only the Goto True and False rows differ for the same logic table functions between each ES set.

In our example, begins with the HD function, then ES Set 1, reading File “22” (a generated file ID for the ORDER001 file) contains view logic 48 and 19.  The second ES Set for File “23” (Order files 2, 3, and 4) contains only View ID 19, and the third set for file ID “24” (the ORDER005 file) contains the view logic for view 47 and 19 and ends with the EN function.

Next we’ll look at the initial messages for the Logic Table Trace.

Slide 15

When Trace is activated, GVBMR95 prints certain messages during initialization, before parallel processing begins.  Let’s examine these messages.

  • MR95 validates any input parameters listed in the JCL for proper keywords and values
  • MR95 loads the VDP file from disk to memory
  • MR95 next loads the Logic Table from disk to memory
  • MR95 clones ES Sets to create threads, discussed more fully on the next slide
  • MR95 loads the REH Reference Data Header file into memory, and from this table then loads each RED Reference Data file into memory.  During this process it check for proper sort order of each key in each reference table
  • Using each function code in the logic table, MR95 creates a customized machine code program in memory.  The data in this section can assist SAFR support in locating in memory the code generated for specific LT Functions
  • Next MR95 opens each of the output files to be used.  Opening of input Event files is done within each thread, but because threads can share output files, they are opened before beginning parallel processing
  • MR95 updates various addresses in the generated machine code in memory
  • Having loaded all necessary input and reference data, generated the machine code for each thread, and opened output files, MR95 then begins parallel processing.  It does this by instructing the operating system to execute each generated program.  The main MR95 program (sometimes called the “Mother Task”) then goes to sleep until all the subtasks are completed.  At this point, if no errors have occurred in any thread, the mother task wakes up, closes all extract files, and prints the control report.

Slide 16

The GVBMR90 logic table only includes ES Sets, not individual threads.  When GVBMR95 begins, it detects if multiple files must be read by a single ES Set.  If so, it clones the ES Set logic to create multiple threads from the same section of the logic table.

In our example views, the single ES Set for the Order 2, 3 and 4 files is cloned during MR95 initialization to create individual threads for each input file.

Slide 17

During parallel processing GVBMR95 prints certain messages to the JES Message Log

  • The first message shows the number of threads started, and the time parallel processing began
  • As each thread finishes processing, it also prints a message showing the time it completed
  • The thread number completed
  • The DD Name of the input file being read, and
  • The record count of the input file records read for that thread

Each thread is independent (asynchronous) with any other thread (and the sleeping Mother Task during thread processing).  The order and length of work each performs is under the control of the operating system.  A thread may process one or many input event records in bursts, and then be swapped off the CPU to await some system resource or higher priority work.  Thus the timing of starting and stopping for each thread cannot be predicted.

Slide 18

The Trace output is a shared output for all processing threads.  Because threads process independently, under the control of the operating system, the order of the records written to the trace is unpredictable.  There may be long bursts of trace records from one thread, followed by a long burst of processing within another thread.  This would be the case if only two threads were being processed (for two input event files) on a one CPU machine (there would be no parallel processing in this case either)  How long each thread remains on the CPU is determined by the operating system. 

Alternatively, records from two or more threads may be interleaved in the trace output, as they are processed by different CPUs.  Thus the EVENT DDNAME column become critical to determining which thread a specific trace record is for.  The DD Name value is typically the input Event file DD name.  The Event Record count (the next column) always increases for a particular thread.  Thus record 3 is always processed after record 2 for one specific DD Name.

In this example,

  • The ORDER002 thread begins processing, and processes input event records 1 – 13 for view 19 before any other thread processes. 
  • It stops processing for a moment, and ORDER001 begins processing, but only completes 1 input event record for views 148 and view 19 before stopping to process. 
  • ORDER002 picks back up and processes record 14 for its one view. 
  • ORDER005 finally begins processing, completing input record 1 for both views 147 and 19. 
  • ORDER001 begins processing again, record 2 for both views 148 and 19.

Note that this portion of the trace does not show any processing for thread ORDER003 and ORDER004.

Slide 19

In a prior module we showed examples of Trace parameters which control what trace outputs are written to the trace file.  One of the most useful is the TRACEINPUT parameter, which allows tracing a specific thread or threads.  It uses the thread DD Name to restrict output to that thread.

Slide 20

The heading section of the GVBMR95 control report provides information about the GVBMR95 environment and execution.  It includes:

  • SAFR executable version information
  • The system date and time of the execution of the SAFR process.  (This is not the “Run Date” parameter, but the actual system date and time)
  • The GVBMR95 parameters provided for this execution, including overall parameters, environment variables, and trace parameters
  • zIIP processing status information

Slide 21

The remainder of the control report shows additional information when running in parallel mode:

  • The number of threads executed, 5 in this example, is shown
  • The number of views run is not simply a count of the views in the VDP, but the replicated ES Set views.  In this example, because view 19 read 5 sources, it counts as 5, plus view 147 and view 148 equals 7
  • The greater the number of views and threads that are run, the larger the size of the generated machine code programs,  in bytes. 
  • The input record count for each thread is shown, along with it’s DD name.  Because GVBMR95 ran in full parallel mode, the record counts for the thread are the same as the record counts for each input file.  Later we’ll explain how these numbers can be different.
  • The results of each view against each input DD Name (thread in this case) is shown, including lookups found and not found, records extracted, DD name where those records were written to, and the number of records read from that DD Name.  Certain additional features, explain in a later module, can cause the record counts read for a view to be different than the record counts for the input file.

Slide 22

Because SAFR allows sharing output files across threads, it is possible to have many views writing to a single output file.  The outputs may either be all the same record formats, or differing formats as variable length records.  This can allow for complex view logic to be broken up multiple views, each containing a portion of the logic for a particular type of record output.

Slide 23

This control report shows the results of changing our views to write their outputs to one file.  The total record count of all records written remains the same.  They are now consolidated into one single file, rather than 3 files.

Similar to the Trace output, the order of the records written to the extract file by two or more threads is unpredictable. 

As we’ll discuss in a later module, there is a performance impact for multiple threads writing to a single output file.  Threads ready to write a record may have to wait, and the coordination of which thread should wait consumes resources.  Therefore it can be more efficient to limit the amount of sharing of output files when possible, and combine files (through concatenation, etc.) after SAFR processing.

Slide 24

In this execution, we have included additional views, reading the same input event files, but which produce Format Phase views.  The rows showed in red are added to the prior control report as additional outputs from the single scan of the input event files.  This also demonstrates how the standard extract files can be shared across multiple threads.

Slide 25

SAFR provides the ability to control the overall number of threads executed in parallel without changing the views.  The GVBMR95 parameters Disk and Tape Threads specify how many parallel threads GVBRM95 should execute in parallel.  Disk threads control the number of threads reading input Event Files on disk, as specified in the SAFR Physical File meta data; tape threads control those input Event Files which are on tape. 

The default value for these parameters is 999 meaning effectively all threads should be executed in parallel.  If one of the parameters is set to a number less than the number of input Event Files of a specific type, SAFR will generate only that many threads.  GVBMR95 will then process multiple event files serially within one of those threads.  As one event file completes processing, that thread will then examine if more event files remain to be processed and if so, it will process another Event file under the same thread.

Slide 26

The GVBMR95 Control reports show the results of thread governor.  In the top, the disk governor parameters are shown, and the number of records read by each thread are shown.  Because each thread only processed one event file, and all were done in parallel, the records read from the event file and for the thread are the same.

In the bottom example, the thread governor has been set to 1, meaning only one thread will be run.  In this instance the control report shows that the total records read for the thread equal the total reads read for all event files because this thread processed each event file serially until all threads were complete. 

Slide 27

The above highlight the key z/OS statistics often tracked from GVBMR95 runs.  These include CPU time, elapsed time, memory usage, and IO counts.

The top set of statistics are from the run with parallelism, the bottom from a run using the Thread Governor with no parallelism.  The impact of parallelism can be seen in the difference between the elapsed time for a job with parallelism and one without. Parallelism cut the elapsed time in half. 

The Extract Program, GVBMR95, is subject to all standard z/OS control features, including job classes, workload class, priority, etc.  In other words, even with parallel processing, the z/OS job class or priority may be set to such a low level, and the other workloads in higher classes on the machine may be so significant that effectively no parallel processing occurs even though GVBMR95 instructs z/OS to execute multiple threads.  Each of these threads receive the same job class, priority, etc. as the entire Extract Phase job.  In this way the operator and system program remain in control of all SAFR jobs.

Slide 28

This module described SAFR Parallel Processing. Now that you have completed this module, you should be able to:

  • Describe SAFR’s Parallel Processing feature
  • Read a Logic Table and Trace with multiple sources
  • Set Trace parameters to filter out unneeded rows
  • Configure shared extract files across parallel threads
  • Set thread governors to conserve system resources
  • Debug a SAFR execution with multiple sources

Slide 29

Additional information about SAFR is available at the web addresses shown here. This concludes Module 19, Parallel Processing

Slide 30

This page does not have audio.

Training Module 17: Format Phase Views

The slides for the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 17, Format Phase Views

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for Format-Phase Views
  • Read a Logic Table and Trace with format-phase views
  • Explain the Function Codes used in the example
  • Debug format-phase views

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 4

The Format Phase is the final possible phase of the Performance Engine Processes.  Views that require only extraction and no sorting or aggregation are completed at the Extract Phase.

Any view that requires the following must continue to the Format Phase:

  • Summarize records
  • Use output record filtering which is applied after records are subtotaled
  • Produce sorted outputs
  • Create delimited files, and
  • Make hardcopy formatted printed reports

The Format Phase is run after the Extract Phase (GVBMR95) has completed extracting all data.

Slide 5

The GVBMR95 Extract Engine performs functions one event or transaction record at a time.  Records are evaluated for inclusion or exclusion in output files, fields are extracted and used to build keys for joins, all one record at a time.  Except for Extract Time Summarization (which will be discussed in a later module), GVBMR95 does not produced summarized results. 

You may remember that in Module 10 we discussed summary reports.  These report aggregate detailed records to produce summarized results.  Even detailed reports can include subtotals, which are the results of summarizing many detailed records.  To perform these function, the Format Phase Engine, GVBMR88 is required.

Slide 6

Extracted data is passed in a special format, called the SAFR Standard Extract File, from the Extract Phase to the Format Phase.  Creating this format requires unique Logic Table Functions in the Extract Phase.  In this module we will focus on these Format Phase logic table functions and introduce the Extract File Format.

Slide 7

The Extract Engine, GVBMR95 is a parallel processing engine, yet the Format Phase, GVBMR88 is not; it processes only one Extract Work File.  Yet that extract file can contain the extracted records for multiple views.  Each GVBMR88 execution can produce multiple extract files. The final output for each view is produced by only one format engine execution.

In this example the extract file on the left has the extracted records for views 2, 3 and 4, while the one on the right has the records for views 8, 9 and 10.  The GVBMR88 execution on the left produces the output for view 2, 3 and 4, while the one on the right produces the output for view 8, 9 and 10.

Slide 8

As previously mentioned, GVBMR95 creates a specially formatted extract file and there can be one or more created based on how the data needs to be processed.  The extract file to be used by each view is assigned in the view parameters screen, the Extract Phase Tab.  The Extract Engine concatenates the assigned number to the value “EXTR” with three digits representing the Extract Work File Number, beginning with zeros if necessary to form the DD Name to be used.  It also concatenates this number with the value “SORT” to write the parameters specifying how the views should be sorted.

This example uses Extract Work File Number 1. The extract records will then be written to the file assigned to the DD Name EXTR001, and the sort parameters to SORT001.  These two files must be passed via the JCL as SORTIN and SYSIN files respectively to the Format Engine GVBMR88. 

Slide 9

The Format Phase Extract record has a standard format for all views.  This specialized format allows for very efficient processing of Format Phase views.  The Extract and Format engines have clearly defined functions, such that steps are not repeated between the two engines. The extract file format prepares the data to be sorted and summarized.

The record contains the following segments: 

  • The control area contains values that describe the rest of the record, like the number of SAFR view columns on this record, and the length of the sort key. The last field in the control area is the view number.
  • The Sort Key or SK area contains the values the user has specified the output file should be sorted and/or grouped by.
  • The Data Transform or DT area contains alphabetic and alphanumeric and numeric column data that is not used in a format time calculation, including sub-totaling, or format time selection logic.
  • The Calculated Transform or CT area contains column data that is either used in a format time calculation, including sub-totaling, or format time selection logic.

The sample data is shown in HEX mode.  The first row is the display format, and the next two lines show a half a byte each, first (top line) and last (bottom line).  This shows packed and non-printable characters.

Slide 10

In this module, we will use the view shown here as the example view.  This view includes a simply lookup for record level filtering.  Record level filtering does not affect the format of the extracted records.

This view has 7 columns.  The first three columns are sort keys, highlighted by the yellow sort key indicator.  In other words the view output will be sorted by the values in these three columns.  The records will be sorted first by the Format value in column 1, then the Store ID value in column 2, and finally by the Gender Code in column 3.  We know this because of the #1, #2, and #3 showing the sort key position.

Slide 11

Columns 4 and 5 contain data that requires a calculation.  In this case, they are to be summed, to produce the total for a change in the sort key value.  Thus the data required for these calculations will be placed in the Calculated Transform or CT area of the extract record. 

Slide 12

Columns 6 and 7 are date fields which are not used in a calculation.  They do not have Format Phase Filtering logic, subtotals or any column calculations.  The data for these columns will be placed in the Data Transform area, or DT columns area of the extract record.

Note that the view columns can be constructed in any order needed for the final output.  The columns will be reordered into either Sort, DT or CT columns in the extract file, but then will be reordered for the final output at the end of Format Phase processing.

Slide 13

This is the logic table produced by our sample view.  The first functions of this view include record level filters which test a join to ensure only records which are found are included in the output.

Slide 14

Sort columns, which populate the Sort Area of the Extract Record, use SK functions to move selected fields and constants to the extract record.  These functions follow a familiar pattern.  SKE functions move data from the Event File to the sort area, where as SKL move Looked-up field values, and SKC move constants.  The sort sequence number is shown in the Sequence Number column.

In this example the first column is a constant value SKC Sort Key Constant.  This places the constant “TRN01” in the first sort position; because this is a constant with no logic functions, all values on the output file from this view will be the same. 

Column 2 moves the Store ID field from an event file field using the SKE Sort Key, Event File Field function.

Column 3 moves the gender code, either an “M” or an “F” from the customer file through a looked up value using an SKL Sort Key Lookup Field function

Note that the entire length of the Sort Key Area is 9 bytes, also indicated in the Logic Table by the SK Length comment on the NV row.

Slide 15

Calculated Transforms, CT columns, also follow the same pattern.  Constants are moved via CTC, CT Column from Constant functions, Event File Fields using CTE CT Column from Event File Field functions, and looked up values using the CTL CT Column from Looked-up Field function. 

Our example view contains a constant value with the CTC function for Column 4.  This is a record count, adding a value of 1 to each record.  Column 5 is an Event File Field value from the CTE function, the Order Amount.  It does not contain a looked up CT value, so no CTL is shown. 

Note that the final two columns, containing our Data Transform values, use DT functions to populate the extract record.  Both these columns are the customer date birth.

Slide 16

Note that the WR, or Write Function, also changes when writing to a standard extract file.  Instead of a WRIN function to copy the input record to the extract file, or the WRDT function to write only the DT column data area of the extract record to the output file, the WRXT function writes the entire standard Extract Record to the extract file.  The sequence number following this instruction contains the file number the extract record should be written to. 

In this example, the extract record contains the control area, Sort Data (9 bytes in length) containing “TRN01001M”, two dates in the DT area of the record, and two CT Columns containing packed data.  This record is written in this instance to Extract File 1.

Next we’ll examine the Logic Table Trace, and the extract record as it is built

Slide 17

When processing the first record, the Logic Table Trace shows the results of these logic table functions. In this example, the SKC constant value of TRN01 is stored in the first position because it is the first sorted field. The Event File Field value of 001 is stored next, and the Looked-up value of M is stored after that. 

Slide 18

If we had simply changed the Sort Key order—the values shown in Yellow in the columns—such that the Gender CD had a #1 as the first sort key, and Format had a #2, and Store ID a #3, the position of the values in the final output file would not change.  But the order of the rows in the file may well have changed after being sorted.  This is because the file would be sorted first by Gender Code, then by Format, and last by Store ID.  Thus all F values would be at the top of the file followed by all M values.  To accomplish this, the Gender code would have been moved to the first position in the extract file for sorting.  The Format Engine would have moved this value back to the 9th position in the output file. 

Slide 19

The Extract Engine formats CT data into an efficient format for the calculations in the Format Engine.  This format is Packed 12 bytes which can hold 23 digits.  The last 8 digits are always fixed decimal points.  The value is also preceded by a binary half word (two bytes) containing the column identifier.  Column values of all zeros are not stored in the extract record, to conserve space and processing time. Because of this sometimes no CT data is stored for a specific column.  Thus all CT columns are on the end of the extract record, allowing the record length to vary. 

In this example, the constant from the CTC function of 1.00000000 is stored immediately after the binary column identifier of 4.  The Event File Field value from the CTE function of 58.25 is stored immediately after the binary column identifier of 5. 

Slide 20

The DT functions move data to the space between the SK area and the CT area.  These functions were used in the earlier module explaining the Extract Only view, with the WRDT function.  The WRDT function only writes the data in the DT Area of the Extract Record. 

Data is moved to the appropriate area of the extract record no matter the order of the Logic Table functions in the Logic Table.  For example, if the first columns of the view are CT columns, the CT area of the record will be populated. 

In this example, the two dates are moved to the DT area of the extract record even though they contain only numeric values.  CT columns are reserved for columns which require calculations, subtotaling, or Format Phase selection logic.  Numeric data simply placed in the output file are placed in the DT area of the extract record.

Slide 21

The WRXT row adds information to the front of the Extract Record.  This includes total record length, length of the SK and DT areas, counts of CT columns.  These control fields are used by the Format Engine to process the extract record.

The last field in this area, just prior to the SK area, is a binary version of the View Number.  It is stored as binary to conserve space in the extract file, optimizing IO and storage.  The view number is multiplied by 2, 1 added and then converted to binary.  Thus in this example, View 143 * 2 = 286 + 1 = 287 converted to binary is 11F.  This formula allows a header record (discussed later) to be sorted to the top of the view.

Slide 22

In standard mode, the Extract Engine does not perform any summarization.  Thus as event file record 2 is read and passes record filtering, a second extract record is written.  If only one view is running, the records will be written in the same order as they are in the input file, which may have no relationship to the required sort order for the view.  If more than one view is reading and selecting records from the event file, the second record may be the same input record extracted for another view, with completely different view sort criteria.

The end result is that the order of the records in the extract file is unpredictable, particularly if views share extract files while running in parallel.

Slide 23

Standard Extract Files—which are those that are processed by the Format Engine—contain Control and Header Records. Each record contains processing information, such as control record counts, the Run Date, etc. for use by the Format Engine.

One Header Record is produced for each view writing to this standard extract file.  The View Number on the Header Record is one less than the extracted event records (in this example 11E rather than 11F). Thus the formula for the view ID on each extract record.  Only one Control Record is written to each Standard Extract File.  The Control record has a view ID of 0.

Writing header and control records to extract files is is done if the extract file number is less than or equal to the Extract Engine JCL STDEXTR=nnn parameter, where nnn is the file count. For example if a 5 views write to records to EXTR002 DD Name, and the STDEXTR parameter is 2, the file would receive 5 header records and 1 control record.

Slide 24

The Extract Engine writes records in a random order.  The Extract File is sorted before being used by the Format Engine.  The Format Engine then summarizes like record by the sort key, and writes the records in order to the output file. 

The Extract File is always sorted by the combination of the View Number, in the control area of the extract record, and the Sort Key data area. Even though the Header and Control Records are the next to last records in the extract file, the lower View Numbers cause them to sort above all the records for this view, with the Control Record on top for Format Engine processing. 

The data in the Extract records are moved to the final output columns and formatted correctly, for example by applying masks to numeric data.  In this example, the CT data is moved before the DT data and decimal points are inserted.  Note also that in summary outputs, data in DT columns can be unpredictable.  If different DT values are extracted for the same sort key combination, which value will be placed on the output file cannot be guaranteed. For example, if there are two different dates in the file 1/1/2014 and 4/15/2014, either may appear on the final record.  Thus typically summary files only contain Sort Key and CT data. 

Slide 25

No matter what sort order is specified in the view for sort fields, SAFR always instructs the Sort Utility to sort the file in Ascending order.  This is necessary because of the mixture of ascending and descending fields within the same view or across different views.  To create descending sorts, SAFR converts descending data in the extract file to it’s “2’s complement” which simplifies sorting.  When sorted and then converted out of 2’s complement, the data will appear in descending order.

In this example, we changed the sort order on the Gender Code from Ascending (in the top example) to Descending (in the bottom example).  The 2’s complement of M and F are unprintable characters with a hex values of “2B” and “39” respectively.  Because “2B” is less than “39”, the “M” records move to the top of the extract file.  Thus when the data is converted out of 2’s complement, the file output shows M before F, a descending sort.

Slide 26

Multiple views can share an extract file.  Because the sort parameters include the View Number at the front of the sort criteria through the length of the longest sort key, sorting causes all the records for a view to be sorted together, and all records for a view to be placed in the sort order specified by the view.  The Extract Engine dynamically builds the sort criteria parameters for each extract file, inserting the longest sort key length for views writing to the extract file.

In this example, the extract file contains records for view 11F and 22E.  The sort parameters are now lengthened to include the additional three bytes of sort key data required by view 22E.  After sorting the file, the Control Record is followed by the Header Record for view 11F, and all the extracted data for that view in sorted order including the Gender Code, then the Control Record for view 22E followed by it’s extracted data also in order by the sort key.

Slide 27

To avoid writing the sorted extract file, and then reading this file again for summarization and formatting, GVBMR88 calls the Sort Utility as a “Sort Exit.”  This allows the sort utility to pass the sorted records to GVBMR88 a record at a time after completing the sort, instead of writing them to disk, thus avoiding a pass of the extract file.  In other words, the sorted records are a virtual file that is never actually written to disk.

The Sort Utility also prints a control report.  This example shows some of the key messages, including noting the utility is running as an exit to “MR88”, the sort parameters that were generated by the Extract Engine, the number of records sorted.  Records are “deleted” from the virtual file by GVBMR88 as it produces the final output file.

Slide 28

The Format Engine also produces a control report at the end of it’s execution.  This control report shows the number of records read from the input file, the number of records written to various types of format files, and the number of records written to view specific output files.  The total records read by GVBMR88 should equal the number of records written by the Extract Engine. 

In this example, 12 records were written to the extract file, and all 12 were read by the Format Engine.  The final output file for view 143 was 2 summarized records.

Slide 29

Not all SAFR functions are specified in the logic table.  For example, column calculations using column numbers are not specified in the logic table.  Other functions include column spacing, creating hard copy report formatting, subtotaling, and summarizing.  All these parameters are passed to GVBMR88 in the View Definition Parameter or VDP file and performed during the format phase.

This screen shot shows the VDP parameters used by GVBMR88, including column definition sizes and formats.  This report is printed at the end of the Format Phase.  The highlighted area for Column 7 contains a column calculation.  The logic for this calculation is “Column 6 divided by Column 5”

Slide 30

One last specialized Logic Table function should be mentioned.  The Format Engine does one type of join, for one format of output:  Sort Titles (descriptions of sorted key values) on hard copy reports.  These GVBMR88 look-ups though are not multiple level joins.  They are simply the last step in the Join process to obtain the sort title.  The Format Engine does not perform any of the LKLR, LKE or other functions to do the join.  Rather it receives the fully formed key as an area of the Sort Key Area.  Saving this key in the sort key area of the extract record.  The Format Engine then performs the lookup to find the sort title.

In this example, the KSLK function save the lookup key of 0001 into the Sort Title Key Save area of the extract record.  GVBMR88 then uses this key to search for customer e-mail addresses to place next to customer numbers in the hard copy report. 

Slide 31

This logic table has contained the following functions:

  • SK functions, including SKC, SKE, and SKL, to build Sort Keys.
  • CT functions, including CTC, CTE, and CTL, to build Calculated Transformations
  • WRXT, which writes the extract record to a standard extract file
  • KSLK, which saves the Look Up Key for a Sort Title on hard copy reports

Slide 32

This module described Format Phase views. Now that you have completed this module, you should be able to:

  • Describe uses for Format-Phase Views
  • Read a Logic Table and Trace with format-phase views
  • Explain the Function Codes used in the example
  • Debug format-phase views

Slide 33

Additional information about SAFR is available at the web addresses shown here. This concludes Module 17, Format Phase Views

Slide 34

Training Module 18: Extract Time Summarization and Other Functions

The slides for the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 18, Extract Time Summarization and Other Functions

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for the Extract-Phase Record Aggregation (formerly knows as Extract-Time Summarization (ETS)) 
  • Read a Logic Table and Trace with ETS views
  • Describe how other SAFR functions affect the Logic Table
  • Explain the Function Codes used in the example
  • Debug ETS and other function views

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 4

In the prior module, we learned how the Format Engine creates summarized outputs.  Except for producing Standard Extract File formatted records, the Extract Engine typically plays no role in summarization.  The one exception is Extract Time Summarization or ETS.  With ETS, some level of summarization occurs at extract time. 

This can have significant benefits if the final summarized output file is relatively small, but the number of event records required to produce it is very large.  This reduces the IO required to write all the detailed extract records, and then for the sort utility to read all those records again.  This is a common problem where high level summaries are required for initial analysis of results, before investigating greater detail.

Slide 5

Use of ETS is specified in the View Properties pane, Extract Record Phase Aggregation parameter.  The view developer specifies the use of Extract Time Summarization.  They also specify how many summarized sort keys the Extract Engine should hold in memory during extract time.  Only the sort keys for records held in this buffer at any one time are eligible summarization.  

Specifying a large number of records may result in greater summarization during the extract phase.  However the Extract Engine allocates memory equal to the number of records multiplied by the number of bytes in each extract record (multiplied by the number of input partitions the view reads when running in parallel mode).  If large buffers are specified, or many views use ETS, the Extract Engine may require substantial amounts of memory.

Slide 6

Although ETS collapses some records, complete record aggregation is only assured in the Format Phase.  ETS may still result in duplicate records, depending on the order of the input records, because the keys required may overflow the ETS buffer.  If the total number of output keys exceed the buffer size, the Extract Engine will dump the record with the least used key to the extract file.

In this example, the ETS record buffer is set to 3 records.  However, the total number of output records in the final file will be 4.  Because of the order of the input records, the extract file will contain two records with the same key. 

Slide 7

After the extract process, the Sort process as part of the Format Phase places all duplicate keyed records next to each other.  Thus, no matter the number of duplicates in the Extract File, all be eliminated by during Format Phase processing. 

In the example on the left the ETS Buffer is set to 3, and two records with the key AAA 111 are written to the extract file.  In the Format Phase these records are summarized together.  If the ETS Buffer had been set to 5, memory would have been allocated but never used. If the ETS buffer had been set to 4 no memory will be wasted, and no unnecessary IO will be required. 

Slide 8

Consider the following when setting the ETS buffer size:

  • The total number of summarized keys in the output file.  ETS is often used for thousands and even tens of thousands of rows.  However it is not typically used for hundreds of thousands of rows or more.
  • The sort order of the input file.  If the sort fields for the view are the same as the sort order of the input Event File, then a buffer size of 1 will result in no duplicates in the extract file.  This is because once a row is dumped to the extract file the same key will not appear again in the input file.
  • The Extract Phase memory available which will be impacted by
  • The number of views executing in this Extract Phase
  • The memory required for reference tables being joined to by these views
  • ETS Buffer sizes specified in other views
  • Other items, such as the size of the logic table, buffer sizes for input and output files and Common Key Buffering, etc.

Slide 9

Similar to the Format Phase, ETS only acts upon CT column data for records with the same Sort Key values. However, unlike Format Phase processing with multiple column calculations possible, ETS only performs summarization. Multiplication and division of values is not possible in ETS.  Also similar to Format Phase processing, resulting DT values are unpredictable. 

In this example the ETS extract file output is only two records, one for “F” and one for “M.” The CT packed values are accumulated to produce the accumulated results.

Slide 10

Although Standard Extract File Format is typically used to send data to the Format Phase, Logical Records can be constructed to read a specific views extract records for additional processing.  The ETS output records are often used with SAFR piping to reduce records processed in downstream threads.  Piping will be discussed in a later module.

Slide 11

The only change to the Logic Table when ETS is used is the WRXT Write Extract Record function is changed to a WRSU Write Summarized record function.  Although a WRXT row can only write one record to the extract file, a WRSU function may write an extract record to the ETS buffer and also a separate overflow record to the extract file if the buffer has overflowed.  At the conclusion of Extract Phase processing, all remaining records in the buffer will be dumped to the Extract File.  These additional write functions are not shown in the Logic Table trace.   The Logic Table trace only shows this function once even though no record or multiple records may have been written to the Extract File. 

Next we’ll examine additional Logic Text Keyword functions, and how they impact the Logic Table

Slide 12

SAFR Logic Text Keywords often manipulate the Constant portion of a CFEC or CFLC Compare Field to Constant.  For example the BatchDate keyword creates a constant of the execution date.  Thus the logic text “Selectif(ORDER_DATE<BATCHDATE())” that is run on January 3, 2010  creates a CFEC comparing ORDER_DATE to the constant “2010-01-03”

Many of the keywords allow math.  For example if the Batch Data contains a +3 inside the parenthesis, and were run on the same date the constant would be “20100106” rather than “03”

These function create very efficient processes as constant manipulation is not required during run time. Only the comparison is required.

Slide 13

The Batch Date (also referred to as the “RunDate”) defaults to the system date if not specified in the JCL.  It can also set to a specific value as a JCL parameter in GVBMR86.  The date is therefore consistent across all views in that execution of GVBMR86.

This date can also be updated in the Logic Table Build program, GVBMR90.  This is useful when VDP’s are managed like source code and only created when changes are made to views.  In this case the VDP will contain a date for when the VDP was created.  These dates can be updated in the Logic Table for the current date by running GVBMR90 in the batch flow and use of the GVBMR90 parameter.

Because processing schedules may provide inconsistent results in rerun situations or when processing near midnight, some installations have written a small program which accesses a enterprise processing table containing a processing date to create this JCL parameter for use by GVBMR86 or GVBMR90.  At the end of successful processing this same program then updates the processing date table with the next days date.

Slide 14

In addition to Batchdate, which returns a full date in CCYYMMDD format, additional key words provide access to a portion of the run date, from day, month, quarter and year.  Additionally, each of these keywords allows a numeric parameter which will add or subtract from the current batch date value for use in logic text.

Other keywords allow for calculations and comparisons of dates.  Days, months and years between functions performs appropriate date math to return the number of days, months or years.  The returned values can be used to evaluate logic text conditions or placed in columns. 

Slide 15

The Fiscal Keywords enable “moving” or “sliding” selection criteria using date criteria.  For example, if the view should select data for the current month, the record filters would need to be updated each month as the BATCHDATE value changes. Batch Date returns the month value, for example, a 9, whereas Fiscal Month is a relative month, with “0” meaning “the current month.”  Using the Fiscal Date and Control Record, the resolved constant in the Logic Table changes without requiring any changes to the view. 

For example, to select data from the current month use the logic text SELECTIF(ORDER_DATE = FISCALMONTH(+0)).  When this is executed with a Fiscal Date parameter of June, it results in a constant selecting June records.  Without updating the view but changing the Fiscal Data parameter to October results in selection of October records.  As the fiscal date constant is updated each month, this effectively creates “moving” criteria for the current month.  Attempting to use the BATCHMONTH keyword would require changing the criteria from a value of “6” for June to “10” for October. 

Also note that the Fiscal Date parameter is useful when processing on a subsequent day, perhaps passed midnight after all business is closed for the prior day.  The Fiscal Date can be set to the prior day or month, recognizing that the data is from the last day of the month, whie the Batch Date reflects when the process was actually run.

Slide 16

The Fiscal Keywords returns dates based on the fiscal values in the control record for the environment for a view. Each view specifies which control record should be used for its fiscal date processing.  This means that different views in the same batch run can have different fiscal dates because they are associated with different control records.  This is useful for processing views for multiple companies that have differing fiscal year ends. By comparison, RUNDAY is the same for all views in a batch.

In this example, the view is assigned the Fiscal Date for Control Record ID 1.  Other views processed at the same time may be associated with Control Record ID 2.

Slide 17

Similar to the Run Date, the actual fiscal date can be overridden in the JCL parameters.  To do so, under the keyword “Fiscal Dates” the Control Record ID is followed by an equal sign, and the override date to be used.  Multiple dates for different control records can be listed. 

Fiscal dates can also be updated through the GVBMR90 parameter for VDP built previously.  If no Fiscal Date parameters are passed to either program, they default to the Run Date as the Fiscal Date for any fiscal date keywords.

In this example, the Fiscal Date for Control Record ID 1 is set to 2010-12-01.  Note that the Run or Batch Date is set June 1, 2010. 

Slide 18

Next we’ll show a logic table containing a resolved fiscal date keyword.  In this example, we use the FISCALMONTH keyword, which requires a field in date form with a CCYYMM content code.  Our selection logic is SELECTIF(ORDER_DATE_CCYYMM = FISCALMONTH(-5)).  The RunDate will be set to 2010-06-01, and Fiscal Date to 2010-12-01.  This should result in selecting records from 5 months ago relative to the Fiscal Date parameter passed to GVBMR86, or records from fiscal month 2010-07.

Slide 19

The view logic generates a CFEC function, shown at the top, with the constant of 2010-07 and trailing digits.  Because the comparison is only 6 bytes these trailing digits have no impact upon processing.  The Logic Table Trace below this shows this value in HEX mode.  In this run, all records failed this test.

Note that if they had passed this selection logic, the lookup is effective dated.  This can be seen from the LKDC, Lookup Key Date Constant function in the logic table.  Note also that this lookup uses the run date, 2010-06, not the fiscal date or the adjusted fiscal date of the Logic Text.  This may produce inconsistent results between the selection logic and the date effective join.  Care must be taken in creating views to be sure parameters are consistent.  Viewing the generated Logic Table can help spot these types of inconsistencies.

Slide 20

Like BATCH DATE, additional keywords provide access to portions of the fiscal date, including Day, Month, Period, Quarters and Year.  These parameters all operate against the Fiscal Date parameter, and can be used in logic text in the view.  Each modifies constants in the Logic Table to perform the function.  They also allow numeric parameters to perform calendar math, either forwards or backwards.

Slide 21

Other Logic Text functions also manipulate the logic table in particular ways.  The String keywords of ALL and REPEAT replicate constant parameters in the logic text into constants in the logic table comparison functions.  The ISFOUND and ISNOTFOUND functions can change the logic table Goto True and False rows for lookups.  Other key words generated specialized logic table functions, like the ISSPACES which generates a CSL logic table function. 

Slide 22

These logic table functions are much less common, but might be seen in logic tables.  They perform functions such as declaring accumulator variables, setting those accumulators to values, using them in comparisons and extracting them, and performing math or functions against them.  Also logic table functions which work against the prior record might be seen, or range comparisons, or string searches.

Also, if the Logic Text keyword “Begins with” is used, SAFR changes the length of the field being tested to the length of the constant in the logic text. This may result is something like a standard CFEC compare field Event file field to a constant in the Logic Table with the adjust field length.

Slide 23

In this module, we examined the WRSU logic table function, which writes summarized Standard Extract File records.  Other less common logic table functions were also introduced.

Slide 24

This module described extract time summarization and other logic text keyword functions. Now that you have completed this module, you should be able to:

  • Describe uses for the Extract-Phase Record Aggregation (formerly knows as Extract-Time Summarization (ETS)) 
  • Read a Logic Table and Trace with ETS views
  • Describe how other SAFR functions affect the Logic Table
  • Explain the Function Codes used in the example
  • Debug ETS and other function views

Slide 25

Additional information about SAFR is available at the web addresses shown here. This concludes Module 18, Extract Time Summarization and Other Functions

Slide 26

Training Module 16: Effective Dated Lookups

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 16, Effective Dated Lookups

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for effective-dated lookups
  • Read a Logic Table and Trace with effective-dated lookups
  • Explain the Function Codes used in the example
  • Debug lookups with effective dates

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 4

Many reporting applications perform date effective processing to:

  • Spot trends
  • Perform “What-if” analysis
  • Analyze changes in positions
  • Reconstruct historical perspectives for audit and other purposes

As you can see in this example, the number of customers by division by year is shown.  SAFR’s date effective joins provide a powerful way to perform these types of functions.  Changes in reference data can be maintained to allow for recreation of reports as of a point in time.  This can be done in individual views or a view might include multiple “as of” dates.

Slide 5

We’ll expand our example by including a new table, the Customer Name table.  This table is keyed by the Customer ID.  This Customer system tracks changes to data via effective dates, to keep track of changes to people’s name, perhaps when they get married.  To do so, it has an additional key to the table, the Effective Start Date.  Thus for the same Customer ID it could have two names, each effective for specific times.

Slide 6

When using tables structured for SAFR effective dated joins, the same primary key value can have two different sets of attributes.  The effective start date is used to distinguish between the two values, meaning when each record became effective.  A start date of all zeroes means the record has always been effective; in other words it has been effective since the earliest possible “date”. 

In this example, two different start dates are used to search the reference table for customer 5.  The first is 2012, and so it returns the record which started in 1985.  Customer 5’s name in 2012 was Sally Rogers Smith. 

The second query is for 1984, and so the earlier record, for the earliest possible date, is returned.  Customer 5’s name from the earliest possible date until June 22, 1985 was Sally Rogers.  SAFR assumes that the effective “end” date of the first record is the start date of the next record.

In many applications, exact binary matches of keys required the data value be exactly the same, including the effective date.  This can require many duplicate records to have all possible dates covered.  SAFR doesn’t require this.  It tests for the most recent effective dated record “falling back” from the key date provided to the last effective record.  This can significantly reduce the size of effective-dated reference files, which thus use less memory while still performing very high speed joins.

Slide 7

SAFR also provides the ability to use an effective End date as well when an explicit end date is stored on each record.  This is useful rather than assuming the effective “end” date of the first record is the start date of the next record.  This functionality allows for gaps when no effective date record may be in force.

When using end date processing, SAFR starts with the same “fall back” logic to find the earliest start dated record.  It then performs one more test, checking to see if the date provided is also less than the end date.  If it is not, SAFR returns a not found condition.

In this example we repeat the same search for customer 5’s name.  The 2012 search returns the same results because the end date of 9999-99-99 greater than 2012; in fact it is the highest possible date.  This record is effective forever. 

The 1984 search returns a Not Found condition, because the effective date of 1984 is between the 1983 end date of the first record, and the 1985 start date of the next record.

This same functionality in other tools can consume a lot of CPU resource testing for valid date ranges.  SAFR effective date search is very efficient, on the order of 2 additional CPU instructions for each effective date test.

Slide 8

The Effective Dates, both Start and End, are defined on the Logical Record.  There can be only one Start and one End effective date on each LR.  Both parameters are optional, but an End Date requires a Start Date be defined.  They can be located anywhere on the LR. 

In this example, the Customer Name LR has both a Start and an End date defined. 

Slide 9

The Start Date and End Date parameters should not be placed on the Lookup Path, and do not affect how the lookup path is constructed.  Fields listed in the lookup path are used for an exact binary match.  Because the Start and End dates are not exact binary matches, they are not placed on the Lookup Path.  However, remember the start and end dates are marked as effective dates on the LR, instructing SAFR to use them during processing.

In this example, only the field to be used to search the 10 byte Customer ID is defined in the Lookup Path.

Slide 10

Use of a Start Date or Start and End Dates affect the Join Phase processing.  It is not possible to use just an end date.  The values in these fields are added to the RED Core Image file at the end of the LR defined key data. All field date effective dates are converted to binary, and SAFR will automatically convert all dates to the same content code of CCYY-MM-DD regardless of the source data format.  This is done to reduce the amount of memory used by them and ensure the test for effective dates test years, then months, then days. (Note that even though dates are converted to binary format, no exact match of dates is required).

Note that use of a Start Date parameter is the only case where duplicate keys are allowed in a SAFR Join Reference file.  In this example, although there are two records for Customer 5, the start date ensures uniqueness in the search results.  

Like all other RED Core Image files, it must be in sorted order by the (1) key, such as the Customer ID in this example, (2) the start date, and (3) end date, if used. 

Slide 11

The Effective date used in the search key can come from three different places. 

  • A constant placed in the view
  • A constant fed to Performance Engine programs in the job stream, or
  • The value from a field on the Source Event File

Each of these will be discussed in the following slides.

Slide 12

A constant “as-of” effective date value can be hard coded in the view.  To do so, place it immediately below the path in the column parameters.  If used, the same value will be used for all lookups for that column.  The value would have to be updated manually if the view is reused, for example, to report processing results at the end of every month. 

In this example all customer names as of 2010-01-01 would be extracted in column 2. 

Use of this approach is effective for (1) unit testing views (2) one time views to produced a particular report for analysis, or (3) for future dated views to do what-if analysis for reorganizations.

Slide 13

A more frequently used option is to use the RUNDATE parameter.  The RUNDATE can be set in the GVBMR86 step of the Selection Job of the Performance Engine.  If the parameter is not set, GVBMR86 will default to the system date.  Thus each day a new VDP is created, the RUNDATE will be advanced to the current date. 

In our example then, although the view might have been created in 2010, if GVBMR86 is run in 2012-10-03, all customer names as of that date will be extracted by our example view. 

Often rather than using the system date which might change arbitrarily at midnight, small programs are created to pass GVBMR86 a “fiscal date” of some predetermined kind.  A fiscal date allows all views running at the same time to use the same date.  Also, even if a set of views needs to be re-run on another date, they can be “as of” the required date.  For example, running reports for the year end December 31 can be done even though the system date is January 1st

Slide 14

A field on the source Event File can also be used as an effective date.  This is similar to using it in a lookup path, although it is not placed in the path and exact match of values is not required.  If used for the date effective join, the date used can vary record by record, unlike the other two date constants described.

In this example, the Date Became Customer field is used to find the customer’s name when they became a customer.  The report is not as of 2010, or 2012, but varies customer by customer depending upon the date stored in the customer Event file.

Each use of a look-up path containing an effective date in a view requires specifying which date should be used for that join, either a constant, the run date, or from a field value.  Thus a single view can use all three sources for different joins.

Next, we’ll examine the Logic Table generated by these different options.

Slide 15

The View Constant and Run Date effective joins use an LKDC Logic Table Function Code.  Remember LK functions build look-up keys prior to performing searches of core image files.  LKD functions build lookup keys containing dates for effective dates.  LKDC builds a lookup Effective Date from a Constant. 

The only difference between the View Constant and Run Date is the source of the constant.  The View Constant is the value placed in the view itself.  The Run Date constant is placed there by GVBMR86 at run time.  One cannot detect which source was used for the effective date by examining the logic table.  In this example, the View constant is 2010-01-01.  The Run Date constant is 2014-04-04.

Because all effective dates are converted to a four byte binary (a FM type 6 on the logic table report) which can potentially hold 10 digits, an additional “00” is appended to the end of the date constants for the join processing, but does not affect the source or target date.

Slide 16

The view using a source Event File Field for the date effective join uses an LKDE Logic Table Function Code, which builds a Lookup Key effective Date using the value from a source Event File Field. Rather than seeing a specific date in the logic table, the value to be used will be moved from a field. This is like an LKE function, but used for an effective date.

In our Event File Field example the field ID 68, DATE_BECAME_CUSTOMER, will be moved to the Lookup Key Effective date position.

Slide 17

As noted earlier, all these dates are given a four byte binary (“FM” Format code 6) field, with a CCYYMMDD content, (“CN” Content code 3).  The Constants are put into these formats in the logic table to save CPU cycles during run time, thus they are listed with these formats as SOURCE.  The date from the Event File field will be converted to this format as the data is read during execution.

Slide 18

Next we’ll trace records through these logic tables.  We’ll use this reference file as the target of our join, searching against customer 1, Michael Clark.  We’ll show what the results of the date effective search would be for various date combinations.

Slide 19

The Logic Table Trace for one record against all views is shown here.  Each view executes an LKE to move the Customer ID “1” to the lookup key.  Then each executes the effective date build, either an LKDC or LKDE.  The LUSM lookup function is performed, and then the customer last name is moved to the extract file through a DTL Data lookup function, and the record is then written using the WRDT Write the extract record Data Area function code. 

We’ll examine the details of each on the following slides.

Slide 20

Let’s examine the LKDC and LKDE functions more closely, and what effective date is used in each. 

At the bottom of the screen is shown the Effective Start date for the target reference file record, 1961-03-15, and it’s end date is 9999-99-99.  Since all three LKD effective dates, 2010 for the View Constant, 2014 for the Run Date, and 1999 for the Event File Field value are within this date range the results for all three views are found conditions.

Note that the value for the LKDE is not shown in binary in the trace because the source data is not binary.  Conversion to binary takes place as part of this logic table function processing. 

Slide 21

In this example, we’ve changed the reference file record effective date to start in 2011 rather than 1961. 

  • The View constant compares the LKDC 2010 which is less than the start date of 2011, resulting in a Not Found condition
  • The Run date compares LKDC 2014 to a start date of 2011 and find the reference file record is effective, and so results in a Found condition
  • The LKDE Field value contains 1999 which is less than the start date of 2011, and thus results in a Not Found condition. 

Thus views 1 and 3 use DTC function code, move a Data Constant of spaces to the extract file. 

Slide 22

In this example, we’ve changed the reference file effective END date (not the START Date) from a very long time in the future, 9999-99-99, to a while ago in the past at 2000-01-01. 

  • The View constant compares LKDC 2010 which is greater than the end date of 2000, resulting in a Not Found condition
  • The Run Date compares LKDC 2014 to an end date of 2000 and find the reference file record is no longer effective, and so results in a Not Found condition
  • The LKDE Field value contains 1999 is less than the end date of 2000, the reference file record is still effective, and thus results in a Found condition. 

Thus views 1 and 2 use DTC function code to move a Data Constant of spaces to the extract file. 

Slide 23

In this example, we’ve added another reference file record for customer 1 reflecting a name change to CLARK-SMITH.  However, there is gap between the two records.  One still ends in 2000, but the new one does not start until 2011.  The view still requires an “as of” date of 20100101.

There is no change to the View Constant process. The lookup key is primed with customer 1, and an LKDC effective date of 2010.  The search finds the CLARK record because the new record doesn’t start until 2011.   After finding this record, the program compares the view effective LKDC date constant 2010 which is greater than the end date of 2000, resulting in a Not Found condition

Slide 24

The Run Date view primes the key with customer 1 and an LKDC effective date of 2014.  Therefore, the binary search returns the second record with the effective start date of 2011.  The performance engine then tests the LKDC effective date of 2014 with the end date of 9999, and results in a Found condition.  The customer name of CLARK-SMITH is used in the output column.

Slide 25

The Field value view primes the key with customer 1 and a LKDE field value of 1999.  The search returns the first reference file record with its start date of 1961.  A comparison of the end date of 2000 to 1999 shows the record is still effective and thus results in a Found condition.  The view uses the last name Clark to build the output record.

Slide 26

In this example we’ve modified the Customer Name Logical Record so there is no Effective End Date field.  If no end date field is defined on the LR, the start date of the next record is the implicit end date of the prior record.

  • The View Constant searches with an LKDC constant of 2010, which is less than the 2011 of the 2nd record, but greater than the 1961 of the first record.  Thus the 1961 record is effective.  It uses the last name of “Clark” to build the output record
  • The Run Date view searches with an LKDC value of 2014 date which returns the 2nd record with a start date of 2011.  Because there is no explicit End Date, this record is assumed to be effective forever.  The view uses the “Clark-Smith” value to build the output record
  • The Field Value view with a LKDE date of 1999 has the same results as the View Constant.

Thus all three views have found conditions.

Slide 27

This Trace shows Event File records beyond the first record for the Event File Field view.  Note that the trace shows that the Date used in the LKDE Function Code changes with each event file record, whereas the constants used in LKDC would remain the same throughout a run.  The relationship between these dates might be summarized as:

  • The View Constant date is always the same, even in between different executions of the view in the Performance Engine if the view is never manually updated.
  • The Run Date Constant is the same within an execution of the Performance Engine, but can change depending on the system date or parameters passed to GVBMR86 between executions
  • The Event File Field Date can vary within an execution of the Performance Engine, as the dates with the source Event File change.

Slide 28

The following are a few of the more common reasons why you may encounter an error in effective date processing, and possible approaches to each:

  • Effective dates should not be included in the lookup path.  If they are, the dates will be moved by LKE functions instead of LKDE or LKDC functions, requiring a exact match, not an effective date “match”.
  • Because the source for an LKDE is a field specified in a view rather than on the lookup path, it is possible to use different LR fields, resulting in different output results per view when similar results were expected. 
  • Improper content code definition for LR dates, either the Join or the Event LR, such as MM-DD-YY, will result in not found conditions.
  • Views can be hardcoded with effective dates, rather than using the Run Date parameter or a Field value.
  • Because each use of a look-up path containing an effective date effective in a view, inconsistent use of constants, run-date, or a field value can provide unexpected results.
  • Improper parameters passed to GVBMR86 can set an incorrect Run Date

Slide 29

This module has introduced the following Logic Table Function Codes:

  • LKDC, builds a date effective lookup key from a constant
  • LKDE, builds a date effective lookup key from the source Event record

The following function code was not shown in an example

  • LKDL, builds a date effective lookup key from a looked up record

Slide 30:

This module described effective dated lookups. Now that you have completed this module, you should be able to:

  • Describe uses for effective-dated lookups
  • Read a Logic Table and Trace with effective-dated lookups
  • Explain the Function Codes used in the example
  • Debug lookups with effective dates

Slide 31

Additional information about SAFR is available at the web addresses shown here. This concludes Module 16, Effective Dated Lookups

Slide 32

Training Module 15: Lookups using Constants and Symbolics

The slides used in this video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 15, Lookups using Constants and Symbolics

Slide 2

Upon completion of this module, you should be able to:

  • Read a Logic Table and Trace with constants and symbolics in lookup key fields
  • Explain the Function Codes used in the example
  • Debug lookups with symbolics
  • Use selected Logic Table Trace Parameters

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 4

Most lookups use key values provided by other LRs, starting with the Source Event LR, and values in the source and target fields are the same. In some cases data values in the input file and those in the reference file cannot be simply matched for a join.  Yet through use of logic, the relationship might be programmed.   The person building the join and view must know the data to properly use these types of joins.

If in our example, we did not have a Customer Table to give us the customer code, the value to find the Gender would need to be provided through the view in some way. 

This can be accomplished in 2 ways: 

  • A Constant can be added to the Lookup Path when defined.  The view cannot change the constant
  • A Symbolic can be defined in the Lookup Path.  The value of the symbolic is assigned in the view

In either case, the value remains the same throughout the processing of the view.

Slide 5

We’ll use a similar example to the prior module, where we wish to output the Order ID, Customer ID and the Customer Gender Code.  In our example, we won’t change either the input or output data, so a typical join could be used.  We’ll show that a constant could be if either the source field or the target field were different values

Slide 6

Both types of joins still require the source and target logical records to be specified, even when no fields from the source are used in the search.  Constants and symbolics can be used in combination with fields to perform a join, or they can be the only value to search the target reference file.  They can also be used in any step of a join.

A Constant can be used if the same value should be used in every search of the reference file.  The value to be used in the search for a matching record is placed directly in the Look-up Path definition.  The data type format must also be specified correctly to match the key data of the target reference file.

For our example we have defined this lookup path to use a constant of “M” for the Customer Code key.  We will also use another lookup path with a constant of “F”, which is not shown.

Slide 7

A symbolic join is also defined in the lookup path.  But unlike the constant join, the value to be used in the search is not yet defined.  Rather, the symbolic is a key word that will be used in the view to assign what value should be used to search the target reference table.  In other words, the symbolic is a variable that is resolved using logic text in the view. 

Constants, symbolics and LR fields can be used together in the same join in any order or any combination.

In our example, we’ve defined a symbolic of GENCD for the Gender Code which we will assign in Logic Text.

Slide 8

Lookup paths that contain constants or symbolics are used like any other paths in views.  In both our example views we will use them in logic text.

We’ll use view 129 to demonstrate how paths with embedded constants operate.  In Column 3 of this view, our logic text will first look up to the Customer table using the Customer ID to find the Gender Code of either “M” or “F”.  Instead of using this as the key to the Gender Description table, we will use our new constant paths.

Slide 9

Symbolic views require logic text in order to assign the symbolic value.  The symbol (or variable) assigned in the path is assigned a value within the logic.  This is done by using the path name, followed by a semicolon, a dollar sign and the symbol name, and equal sign and then an assigned value as a constant.  Multiple symbolics can be assigned values one right after the others.

Our example view 130 shows how symbolics work.  As shown, the path name is used, followed by a semicolon, a dollar sign and the symbolic name “$GENCD”, an equal sign and a constant.  Notice that instead of requiring two paths on the prior view, one for “M” and the other for “F”, we now are able to use a single path for both conditions.

Slide 10

The Constant and Symbolic Functions both populate a lookup key using a constant value rather than a value from an input file field, either the source Event File or Look up.

A Constant value is hardcoded in the Lookup Definition, which in turn simplifies view definition with no logic text, but requires more lookup definitions.

A Symbolic allows many possible constant values assigned in Logic Text through a symbolic variable, which means fewer total components with one lookup definition, but each use requires logic text for assignment

Slide 11

Constant and Symbolic joins do not affect the JLT process in any way.  Core image reference files are produced as they would be for any other join.  Constant and symbolic joins only affect the XLT, and as we will see, in very small ways.

Slide 12

This is the complete logic table for the constant view, view 129.   The view has no extract record filtering.  Thus it is composed almost entirely of column dependent logic table rows.  Column 1 requires one row, a DTE to populate the Order ID.  Column 2 requires logic table rows 5 – 10, a join to the Customer Table to obtain the Customer ID.  Column 3 requires rows 11 through 33 to populate the Customer Gender Code using our lookup paths with embedded constants.

Slide 13

This is the complete logic table for the symbolic view, view 130.   It has by and large the same structure as the constant view. One difference is the LT row numbers differ because view 130 is executed in the same Performance Engine run as view 129.  Its LT rows are therefore after those of view 129.

We’ll focus our attention in this module on Column 3, and compare and contrast the differences between these two views and their different paths.  For this logic table row comparison, we’ll change view 130 row numbers to be consistent with view 129.

Slide 14

Inside column 3 logic, both views perform the same lookup to the Customer file to determine the Gender of the Customer.  The first test is for “M”, and a second test for “F”.  Both views also have the same patterned DTL and DTC functions for populating the column based upon lookup results.  When we eliminate all these rows of the logic table for our comparison, we are down to very few rows to analyze.

Slide 15

Lookup paths containing Constants have an LKC Lookup Constant Logic Table Function.  This function code, similar to an LKE and LKL builds a lookup key value to be used when searching a core image file.  The LKC simply places a constant in the key. 

In our view 129 example two LKC’s are used, one each to satisfy the join required by the COLUMN = keyword.  One assigns a value of “M” to the key, the other a value of “F”.

Symbolic paths contain LKS Lookup Symbolic Logic Table Functions.  This function performs the same operation as the LKC, placing a constant in the lookup key before searching a core image file. 

Example view 130 similarly has two LKS functions, one assigning a “M” and the other assigning an “F” before the respective LUSM functions.

Slide 16

Before examining the trace results, let’s look at the Extract Engine GVBMR95 control report to see the results of running these views.

Both views have written 12 output records.  But the found and not found counts are not the same.  Remember that Found and Not Found counts are the results of LUSM functions, and join optimization might simple cause view 130 to perform fewer of these functions. 

Both views wrote their output to the same output file, DD Name OUTPUT06.  Thus records from view 129 are interspersed with records from view 130.  If we examine the file carefully, it appears that each record is basically duplicated, which is what we would expect because the views create effectively the same output in slightly different ways.  And we can see that there appears to be two records, one for each view, which does not have a customer ID or Gender Code.

Lastly, because we were looking up to the Gender Description table, we would have expected to see “MALE” and “FEMALE” on the records, not the “M” and “F” displayed.  Let’s use the Logic Table Trace to examine these potential issues.

Slide 17

In the Introduction to Trace Facilities we briefly discussed the additional parameters available to control what is traced.  We’ll use those parameters highlighted in blue to investigate our view results. 

Although multiple trace parameters can be used at one time only one global parameter is permitted.  The global parameter is the one with no VIEW keyword on it.  Besides the one global parameter, all trace parameter rows require a VIEW parameter or they will be ignored in trace processing.

In the top example, the global parameter TRACEINPUT is used, and then two views each have individual trace parameters. 

In the bottom example, only a global parameter is used which will affect all rows in the logic table for all views. 

Note that an asterisk in the first position of the MR95PARMs marks a comment, which can be useful for saving prior trace parameters.

Slide 18

The TRACEINPUT parameter causes GVBMR95 to print the source Event File record.  The parameter is followed by the DD Name of the Event File to be traced.  In the Trace output the DD name is followed by the record number being printed.

The headings for the Trace Report have been removed in this example to make the printout more clear, and because of the report width is it shown in two parts. 

The input record is printed in two forms, as HEX characters which can allow viewing of things like packed and binary numbers, and as display characters which are easier for reading text. 

The SEQ NUM column before the Hex characters contains an index to the character positions of the printed hex record.  In this example, the first hex value of “F0F0F0F0” begins at offset “000000”.  On the next row the hex value of “00000582” begins at offset 000020 from the start of the record.  Note that a hex 20 is a decimal 32 bytes from the beginning of the record. 

Slide 19

The Trace View Parameter indicates which view should be traced.  In this example, only view 129 will be traced.  Notice that the trace output does not include any trace rows from view 130 even though it was contained in the same logic table and was processed by the Extract Engine at the same time.  This is a simple way of reducing the trace output to a single view of interest.

Slide 20

Another useful keyword is the LTFUNC parameter. 

Multiple VIEW parameters can be used at the same time. In this example, all LUSM functions for both views are shown.  Because we are looking for the same function codes in all the views in the logic table, the same thing could have been accomplished by coding the LTFUNC parameter on the Global Trace Parameter line.

Notice on this trace that for the last two records in the input file, records 11 and 12, only view 129 performed any LUSM functions.  This might explain the differences between Found and Not Found lookup on the GVBMR95 Control report.

Slide 21

Multiple trace parameters can be used together with different sub parameters as well.

  • In this example the orange boxes highlight the Global parameter request for a print of the input record from the ORDER001 DD Name
  • The green boxes show the VIEW=129 parameters print only the LUSM function code rows on input records 11 and 12 and the output.
  • The blue boxes highlight the VIEW=130 parameters requesting the trace to print all LT function codes from Logic Table Rows 54 to 65, on input records 11 and 12.

The output from these last set of parameters shows that only the JOIN function code of LT Row 54 and the DTC of LT Row 65 were executed.  The LUSM and the DTL function codes allow in this section of the Logic Table were not executed.  Thus we can confirm that because the LUSM was not executed the Found and Not Found counts between view 129 and view 130 should be different.

Slide 22

When we inspected the output, we noted it only contained the letters “M” and “F” on the end of the file, not the full reference file title values of “MALE” and “FEMALE”.   If we set our trace parameters to trace only the DTL functions associated with the last column shown in this example, the trace output shows the Value 1 for the DTL function code is a 20 byte source values of “MALE” and “FEMALE” from the full reference file field.  However, the Target length is only 1 byte long.  The field has been truncated because the view definition was coded incorrectly.  Adjusting the view column width and re-running it should correct the error.

Slide 23

The following are descriptions of other trace parameters we have not examined in detail in this module.

  • The DDNAME parameter will trace only the specified input file. While the TRACEINPUT shows input record for a DD Name, this parameter shows only executed logic table rows against that file.
  • The VPOS, VLEN, and VALUE parameters trace only when the data at position for length on the source record is equal to a specific value. These parameters can consume a great deal of CPU resources against very large files.  Use them with care.
  • LTABEND will cause GVBMR95 to produce a dump for debugging if it executes a specific logic table row.  This may be useful when debugging user exits which we’ll discuss in a later module.
  • And MSGABEND will cause MR95 to abend if it produces a specific error number, such as an 0C7 data exception.

Slide 24

This module has introduced the following Logic Table Function Codes:

  • LKC, builds a look up key using a constant
  • LKS, builds a look up key using a symbolic

Slide 25

This module described lookups with constants and symbolics . Now that you have completed this module, you should be able to:

  • Read a Logic Table and Trace with constants and symbolics in lookup key fields
  • Explain the Function Codes used in the example
  • Debug lookups with symbolics
  • Use selected Logic Table Trace Parameters

Slide 26

Additional information about SAFR is available at the web addresses shown here. This concludes Module 15, Lookups using Constants and Symbolics

Slide 27

Training Module 14: Multi-Step Lookups

The slides for this video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 14, Multi-Step Lookups

Slide 2

Upon completion of this module, you should be able to:

  • Describe a multi-step lookup
  • Read a Logic Table and Trace with multi-step lookups
  • Explain the Function Codes used in the example
  • Identify lookup and other patterns in a larger logic table
  • Debug a multi-step lookup not-found condition

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 4

In this module, we will extend the example used in the prior module, where we joined from the Order file to the Customer file to find Customer related data, including the Customer Gender Code, either M or F.  In this module, we’ll use the Gender Code to look up to the Gender Description table to find a description of the M or F, either Male or Female.

SAFR Lookup Definitions specify the full “path” or route from the source data to the final target data.  In the last module we used the “Customer from Order” Lookup Path.  In this module we’ll use the “Gender Description from Order” Look up Path to go through the Customer table to find the Gender Description.

Slide 5

Creating a multi-step look up requires using the “Steps” portion of the workbench.  Each step allows definition of a new target table.  All fields from prior steps can be used as sources for key values with which to search these tables. 

Care should be taken when building paths with respect to required data format conversions.  All SAFR joins use binary searches, meaning it matches byte for byte the key to a potential target record.  Joins on numeric keys can fail if the source field formats do not match the target formats.  SAFR will perform the necessary data conversations in the logic table when building keys to convert field formats if they are specified properly in the Look-up Path definition. 

Slide 6

Our example lookup View reads the Order logical record as the source event file.  It has three columns, containing the:

  • Order ID from the source event file
  • Customer ID from the Customer Logical Record
  • The Customer Gender Description from the Gender Description Logical Record

The Customer Logical file requires the Customer ID from the source event file, a single step lookup.  The Gender Description requires the Gender Code from the Customer logical record, thus creating a multi-step look up.

Slide 7

The view also contains logic, both Extract Record Filtering and column logic.

The Extract Record Filtering logic selects records for customers with a Gender type of “M” and orders where the Order Date is later than the new customer date, or orders which have no associated customer record.  Like column 2, this logic only requires a single step lookup.

The column logic, for column 3, inserts the Customer Gender description either “MALE” or “FEMALE” if found, and the value “UNKNOWN” for those orders with no associated customer record.  This logic requires a multi-step lookup.

Slide 8

GVBMR90 recognizes the view requires two reference tables because it uses the “Gender Description from Order” Look up Path.  It then generates additional Extract Only views in the JLT, creating another Core Image file by reading and extracting data from the Gender Description file, including the Gender Code 1 byte key, and the 20 byte Gender Description. 

The REH file also will contain two header records with counts and sizes of reference data for the Customer Reference table, and the Gender Description table.  Both these Core Image files will be loaded into memory during the Extract phase processing.

Note that the standard Performance Engine Process does not sort the Core Image files, but these files must be in key sequence before being loaded into memory or an error will occur.  Therefore these files should be sorted prior to beginning the Join Phase process.

Slide 9

This is the complete logic table.    Note that this single view requires 38 logic table rows (40 rows less the HD and RENX rows which are shared by all views in this logic table). Logic tables for complex views can be hundreds of rows in length.  Determining which part of a view generated the logic table rows can be important. 

Rows 4 – 14 are generated from the extract record filter logic.   These rows have no sequence numbers.  Rows 15 – 38 have column sequence numbers assigned.  They contain column logic and build column values.

We’ll examine each of these portions individually.

Slide 10

When the same look up path is used multiple times within extract record filter or a column, the lookup path steps are duplicated multiple times in the logic table.  Join optimization may skip rows, but the logic table still contains them. 

In our example, the Extract Record Filter logic requires three lookups to the Customer LR.  Each is accomplished using the same Lookup Path.  Thus each has the same Join, LKE and LUSM functions and the target Customer LR ID is 7.

  • Rows 4 – 6 tests for “M” type customer Gender Codes generates a lookup to the Customer LR.
  • Rows 8 – 10 tests dates generates.
  • Rows 12 – 14 test for not found customers in the OR condition.

Slide 11

Extract Filtering Lookups are often followed by CF functions to test for inclusion or exclusion of records.  In our example, lookup for both clauses 1 and 2 are followed by Compare Field functions. 

  • The test for an “M” requires a CFLC function, Compare Field Looked-up value to Constant.  The CFLC compares the looked up Customer Gender Code to the constant of “M”. If the test is true (the field equals “M”), the CFLC GOTO Row 1 continues to test on row 8 for the AND condition.  If the test is false, then it branches to 12 to begin the OR test for a Not Found condition.
  • Testing the dates requires a CFEL function, Compare Field Event file field to Looked-up field value.  The comparison type is a “003” meaning a greater than test.  If this test proves true the record should be selected; there is no need to test the OR Not Found condition.  Thus GOTO Row 1 jumps to the row 15 DTE function to begin building columns.  If the date tests proves false, then the OR condition is tested, beginning on row 12.
  • Although not shown in this example, a CFEE and CFLL functions are similar in format to the CFEL.  A CFEE compares two source Event File Fields, for equal, not equal, greater or less than, etc.  A CFLL compares two looked-up field values.

Slide 12

The NOT FOUND keyword of the filter logic clause 3 also determines if records are selected for processing or not.  However, it does not generate a Compare Field function.  Rather the Found and Not Found rows of the LUSM cause the selection.  In other words, the true and false rows typically placed on a CF function are placed right on the LUSM (and the Join in case Join Optimization occurs).

In this example, if the LUSM test on row 14 finds a corresponding customer record in the lookup, then the record will be skipped by jumping to logic table row 40, the ES function.  If a corresponding customer record is not found (GOTO Row 2), then the column build processing will begin on logic table row 15.

If instead of the NOT FOUND keyword, FOUND had been used, the GOTO Rows 1 and 2 on the LUSM function would have been switched.

Slide 13

Being able to identify GOTO Row patterns typical of AND and OR logic can help in debugging. 

In this example, the repeated GOTO Row 2 value of 12 pointing to a CF function is indicative of a OR condition. 

The nearly sequential nature of GOTO Row 1 for clauses 1 and 2 are indicative of an AND condition because both must be satisfied before logic can jump from logic table row 11 to 15 to build the columns. 

Slide 14

Column logic can also be broken into segments by looking at the pattern of sequence numbers.  In this example, the three columns of our view require three different types of logic to be completed. 

  • Column 1, the Order ID from the source event file, only requires a DTE function.
  • Column 2, the Customer ID from the Customer Logical Record, requires a single step lookup, followed by a DTL and DTC functions.
  • Column 3, the Customer Gender Description from the Gender Description Logical Record, requires a multiple step look up.  We’ll focus our attention here.

Slide 15

Even the logic for a single column can be broken down into its individual parts. 

In this example, Column 3 logic can be divided by the clauses of the logic text: 

  • The Found test to find a matching Gender Description record
  • Locating the actual Gender Description for use in the extract file
  • Or placing the value “UNKNOWN” in the column if the Gender Description is not found

Next let’s focus on part clause A logic

Slide 16

Patterns of single step joins can often be found as pieces of multi-step joins.  In this example the single step join from column 2 is nearly identical to the first part of our multi-step join, which also joins the Order LR to the Customer LR.  They have nearly identical JOIN, LKE and LUSM functions.  When debugging joins, the correct Lookup path must be identified.

The one significant difference is that the GOTO ROW 1 and 2 are substantially larger on the multi-step join, 6 to 16 steps away as opposed to 3 to 5 steps.  This is because column 3’s multi-step Join covers the entire join, not just the first step.  During Join Optimization, if the join has been performed before, these GOTO ROW 1 and 2 values are used immediately.

Slide 17

The second and subsequent steps of multi-step joins all begin with LKLR functions, Prepare a Lookup to an LR.  This, like the JOIN function code, identifies what core image file in memory to search.  In our example, the next step in the join goes form the LR ID of 7 to the Gender Description table, LR ID 63.

Often a multi-step join includes an LKL function, build a Lookup Key from a Looked Up value.  This moves a value from a looked up field to the search key for the core image file.  In this example, the value in Gender Code field ID 478 on the Customer table will be used to search the Gender Description table for a value, an “M” or an “F” depending upon the customer.

Our example target reference file only includes a single field for the key, the “M” or “F” values.  Multi key tables require multiple LKE and LKL functions, one right after the other in the logic table.  These build the keys for binary searches from left to right.

Slide 18

Again repeated patterns are visible in the logic table, even within logic for one column.  Column 3 of our example performs the same look up twice. 

  • Test to see if a join is successful using the Found Keyword
  • If it is, then use the join to place the Gender Description in the column

Thus our multiple step join is repeated in the column logic.

Slide 19

Some join steps will never be executed during a run because of join optimization.  There can be other functions which will never be executed as well. 

In this example, there is a default Not Found DTC, the companion to the DTL moving the Gender Description code.  Note that it does not have constant “UNKNOWN” assigned to it. This DTC will never be executed because the GOTO ROW2 on both LUSM rows 30 and 33 would never be triggered; rather jumps to row 38 would have already happened on the JOIN on row 22, or LUSM rows 24 or 27.

Having completed our analysis of the logic table, next we’ll execute this view with the Trace Parameter enabled.

Slide 20

This trace example output does not show the record number of the Event DD Name, in order to show the Value 1 and Value 2 output.

Most of the records from the input file meet the selection criteria of being Male Customers and were recorded Customers before placing orders.  The trace for the CFLC, the Compare Field Lookup to Constant, shows the looked-up value as Value 1, and the Constant value as Value 2.  The CFLC function was an equal test, thus because the two are equal the record continues evaluation.

For the CFEL function, Compare Field Event Field to Lookup Field, Value 1 is the source Event field value, and Value 2 is the Looked-up field value.  In this case the Value 1 and Value 2 are different, but the comparison is a Greater Than comparison, so thus the records are selected for processing.

Having passed record filtering logic, the trace shows the DTE function for the first column

Slide 21

The join required for Column 2 to obtain the Customer ID is the same join already performed to find the customer Gender Code for the record filter test.  Thus the LKE and LUSM functions are skipped by join optimization.  The Join function on row 16 immediately jumps to the DTL function on row 19 to populate column 2 with the Customer ID from the Customer file.

Slide 22

The multi-step join is then executed, to test for found on Customer Gender Description by joining first the Customer table.  Having done so successfully, the join uses the Gender Code key for the LKL function to search the Gender Description table. 

Join optimization shortens the next lookup.  The same lookup path just used to test for a Found is used again to actually get the Gender Description starting on Logic Table row 28.  Thus through optimization the B clause logic of this column only requires row 28 Join function.

The DTL function then moves the Gender Description of “MALE” to extract record.

Slide 23

Let’s move on to a new Event File record.  A number of customers are Female according to the Customer file.  On these records the trace shows the Extract Record Filtering CFLC Compare Field Lookup to Constant test failing; the “F” value does not match the constant of “M”.

LT row 12 is also executed to tests to see if the join is Not Found and thus should be included in the output, but of course the look up was successful, and so the record is skipped. 

Slide 24

For multi-step lookups, not found conditions can occur anywhere along the lookup.  First we’ll examine a failure in the first step, the same failure noted in the prior module.  The join from the Order file to the Customer File fails for column 3 because no customer “0000000000” is found.  Instead of proceeding to the LKLR function to get ready for the second join, execution jumps to row 38 DTC to place “UKNOWN” Gender Description in the output file.

Slide 25

To simulate a second step failure, let’s modify the Customer Gender code for Customer 1, from “M” to “O”.  If we do this the first lookup, Logic Table Row 24, would complete successfully.  We would then execute the LKLR, LKL and the second LUSM to search the Gender table.  The LKL function would have used a Gender code of “O”, which will not be found on the Gender table, causing a branch by the second LUSM to the same DTC used on the first step failure we just examined, resulting in the same output value.

Thus the final value in the output file may not indicate which step of a join failed.  Only by tracing through the logic table can this be determined.  The trace might highlight incorrect input data, like a Gender code of “O” or a different data format for numeric keys.  It may also highlight an incorrect LR or Join definition, or improper logic in the view. 

Slide 26

As noted earlier, incorrect number format conversion can cause look-ups to fail.  For example, source and targets that use Binary, Packed and Zoned Decimal field can all have the proper field lengths for use in a lookup path.  However, if the sign characteristics are not properly converted, the correct matching record will not be found. Therefore it is important for LRs to be defined correctly to match the actual data type.  SAFR will do the necessary conversions between data types.

The GVBMR95 Trace output prints the binary data for LKE and other functions.  At times this data is not printable, but using the appropriate command to display hex data on output data (such as “Set Hex On” depending upon the JES configuration) will show the data for debugging.

Slide 27

The Extract Engine GVBMR95 control report again shows the results of Lookup processing.   Remember that the Found and Not Found counts are the results of LUSM functions, whether in 1st step or subsequent steps of joins, and do not include Join optimizations. 

In our example we used two different lookup paths, the single step lookup for the Customer ID, and the multi-step lookup for the Gender description.  Thus although we have only two output records with not found conditions on them, we have 4 not founds in the count, 2 for the Customer ID join failure and 2 for the Gender Description join.

Slide 28

This logic table and module has introduced the following Logic Table Function Code:

  • CFEL, Compares fields, Event File Field to Looked-up Field
  • CFLC, Compares fields, Looked-up field to Constant
  • LKLR, like the Join function, identifies join targets for multi-step joins
  • LKL, builds a look up key using data from a looked-up record

Other related function codes without examples include:

  • CFEE, Compares fields, Event File Field to Event File Field
  • CFLL, Compares fields, Looked-up field to Looked-up field
  • CNL, a Class Test Numeric on a looked up value, similar to a CNE

Slide 29

This module described multi-step lookup processes. Now that you have completed this module, you should be able to:

  • Describe a multi-step lookup
  • Read a Logic Table and Trace with multi-step lookups
  • Explain the Function Codes used in the example
  • Identify lookup and other patterns in a larger logic table
  • Debug a multi-step lookup not-found condition

Slide 30

Additional information about SAFR is available at the web addresses shown here. This concludes Module 14, Multi-Step Lookups

Slide 31

Training Module 13: Single Step Lookups

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 13, Single Step Lookups

Slide 2

Upon completion of this module, you should be able to:

  • Describe the Join Logic Table
  • Describe the advantages of SAFR’s lookup techniques
  • Read a Logic Table and Trace with lookups
  • Explain the Function Codes used in single step lookups
  • Debug a lookup not-found condition

Slide 3

Remember from Module 4 that SAFR allows users to combine or “join” data together from different files for selection tests or inclusion in an output file. Joins or Lookups require a Lookup Path to instruct SAFR which data from the two files to match, the source and the target. The fields in the lookup path are used in the Logic Table to build a key which is then used to search the target table for a match.  All lookups begin by using fields from the Source or Event LR or constants.

Slide 4

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.

Slide 5

The Performance Engine flow changes when joins are performed.  GVBMR90 generates two logic tables, one for use in the Extract Phase, called the XLT for Extract Phase Logic Table. This logic table contains the views selected for processing from the SAFR meta data repository. We’ve analyzed XLT Logic Tables in prior modules.

When lookups are required, GVBMR90 also generates a Join Logic Table, called a JLT.  This logic table contains artificially generated views used to preprocess the reference data.  These views are never stored in the SAFR Metadata repository, and are regenerated every time GVBMR90 is run. 

In Join Processing, the Extract Engine is executed twice, once with the JLT and the second time with the XLT.  The Join Phase GVBMR95 reads the raw Reference Data files as its source data.  The outputs from the Join Phase are used in memory datasets by the Extract Phase to perform high speed joins. 

Slide 6

In this and following modules we’ll use a simple order example.  We’ll create a file that contains the Order ID, Customer E-mail Address, and Customer Gender Code. 

Available data includes

  • Customer’s orders are recorded on the Order file with an Order ID as the key. 
  • Customer data is recorded in a separate file, keyed by Customer ID. 

Each customer can make more than one Order, so the records in the Order file are more numerous than those in the customer file.  SAFR is designed to scan the more detailed file as the source event file and lookup keyed data in another file.  Thus we’ll define the Order file as the Source Event File, and the Customer file as a Reference File.  We’ll use these files to produce our a very simple output.

Slide 7

SAFR loads target join data into memory in order to increase efficiency.  Memory access is thousands of time faster than disk access.  However, memory is more expensive than disk.  Loading the entire source reference data, including fields that are not needed, into memory can be difficult or expensive.  The purpose of the JLT is to create a Core Image File, with only those fields needed for extract phase processing.  In our example, only the Customer ID, which is the key to the table, and the Customer E-mail Address and Customer Gender Code fields are needed in the core image file during Extract Phase processing.

Slide 8

When no joins are required by any view being executed, the MR90 Control Report will show an empty Join Logic Table.  When joins are required, the MR90 Control Report will contain a printed copy of the JLT. 

For our example, the JLT will contain two Extract Only type views, one to extract the reference data needed in a core image file for the Extract Phase and one to create a header record. These Extract Only views are generated by GVBMR90 at run-time-only.  They are never stored in the SAFR Metadata Repository.   In this example, JLT views, 25 and 26 temporary views.

The core image view has three DTE Logic Table Functions, (1) for the Customer ID field, (2) for the Customer E-mail Address and (3) for the Gender Code. It will also have some DTC and other functions to properly format the core image file.

Slide 9

In addition the core image files, which are formally named Reference Extract Data (RED) files, the second JLT view produces the Reference Extract Header file or REH.  The REH is used in the Extract Phase to allocate memory to load the Core Image Files into memory.

Each REH record contains counts of RED records and other information like the key length and offset, and total bytes required.  These records are optionally printed after the Join Phase by the GVBMR71.

These control records are produced by an additional view for each reference file in addition to the Extract Only view in the JLT.   This view contains some special Logic Table function codes that allow it to accumulate record counts.  It also only writes one record no matter how many input records are in the reference file.

Slide 10

Our example look up view reads the Order logical record as the source event file.  It has three columns, containing the:

  • Order ID from the source event file
  • Customer E-mail Address and
  • Customer Gender Code, both from the Customer file

Obtaining the Customer E-mail Address and Gender requires a join to the Customer logical record.  The join requires using the Customer ID on the Order logical record to search for the corresponding customer record in the Customer reference file.

As noted, the JLT will read the Customer reference file.  One view will create the core image RED file which will primarily have the Customer ID and Customer Email Address field data in it.  Another view will produce the REH record with a count of the number of records in the Customer core image file.

In the remainder of this module, we’ll focus exclusively on the Extract Logic Table, the XLT, for this view

Slide 11

The XLT contains the standard HD Header, RENX Read Next, NV New View functions as well as the concluding ES End of Source and EN End of Logic Table functions.

Additionally, it also contains a DTE Data from Event File field function.  This data is for column 1 of the view, indicated by the Sequence Number.  The source data starts at position 1 for a length of 10 bytes.  Under the Target section of the report, we can see that this data consumes 10 bytes in the target extract record.

Slide 12

Lookups always begin with a Join function code. The Join indicates where the Lookup Path information has been inserted into the view.  The steps required to perform the join are not coded in the view directly. 

The Sequence number following the Join indicates which column requires this join if the logic is column specific.  If the logic is general selection the sequence number will be blank.  The Join also indicates the ultimate Target LR for the join. 

The Join also includes Goto rows.  We’ll discuss these later in this module. 

In our example, The output from Column 2 is the Customer E-mail Address which requires a join to the customer file. Note that the NV indicates the source Event file LR is 10, the Order LR.  The join target is LR ID number 7, the Customer reference file.

Slide 13

A Join is almost always followed by one or more LKE functions.  An LKE function instructs the Extract Engine to build a Lookup Key from an Event File field. The LKE lists which LR the data should be taken from.  The Lookup Key, as opposed to the Extract record, is temporary memory containing a key used to search the core image file for a matching value. 

Note in our example that although the Join function Customer table Target LR ID is 7, the data for the key build should be taken from LR ID 10, the Order file Customer ID field.  This data is found on the source record at position 11 for a length of 10 bytes. 

Slide 14

The end of a Join typically has an LUSM.  The LUSM function does the binary search with the key against the Join’s target table in the RED Core Image file to find a matching record.  The function code stands for Lookup to Structure in Memory.

The GOTO rows for the LUSM function are Found and Not Found conditions.  Execution branches to GOTO Row1 if a reference table matching record is found for the LKx function built key.  Execution branches to GOTO Row2 if no matching record is found in the reference structure.

In this example, if a matching Customer record is found for the supplied Customer ID, execution will continue at LT Row 8.  If no customer is found, row 10 will be executed.

Slide 15

LUSM functions are typically followed by a set of functions similar in structure to the CFxx functions.  You’ll remember that if the comparison proves true, often a move Data from the Event field DTE or similar function follows, then a GOTO  function to skip over the ELSE condition.  The else condition is often a DTC, move Data from a Constant default value.

In this lookup example, if the LUSM finds a corresponding Customer ID in the Customer table the LUSM is followed by a DTL, move Data from a Looked-up field value.  If no matching Customer ID is found, a constant of spaces will be placed in the extract file by the DTC function.

Slide 16

The field Source start positions on DTE functions reflect where the data resides in the event file.  This is not true for the DTL functions, because the data is pulled from the RED. The data positions in the RED are not the same as the Reference Data Logical Record. The JLT process moves fields from the LR positions to the RED position, and these positions are determined by the order in which the views use the data.

In our example, the DTL function moves the 40 byte Customer Email Address on the Customer LR to the extract record from a starting position of 1 whereas on the Customer Record, it begins in Position 39. 

If the LR definition of the reference file is incorrect, the RED Key and all the data may be wrong.  At times it is necessary to inspect the RED file to see if the data exists.

Slide 17

Looked-up fields can be placed in columns in two ways.  They may be assigned directly, as shown in this view example and on this logic table.  Or they may be coded in Logic Text. 

If they are placed directed on the view editor and not in logic text, then SAFR automatically generates a default DTC in case of a not found condition. The default value depends upon the format of the target column:  Alphanumeric columns receive a default value of spaces, and numeric columns a default value of zeroes.

If the field is coded in Logic Text, the user can define a default ELSE condition, which would be assigned to the DTC value.

In our example, the default value is spaces supplied by the DTC function if the lookup fails and no customer Email Address is found.

Slide 18

Lookup paths are often reused within views, which means lookup logic is often repeated. 

Our view requires two look ups, one for the Customer Email Address for column 2, and a second for the Customer Gender Code in column 3. After the first lookup, the logic table then contains very similar logic for the second look up.  Except for changes in logic table and GOTO rows and Sequence (column) numbers the second set of Join, LKE and LUSM functions are identical to the first set.

The DTL, GOTO and DTC functions are also very similar between the two lookups, with same row and sequence number differences, yet their source field and target positions and lengths also differ.

Slide 19

The first goal of SAFR is efficiency.  Developers recognized that repeating LKx key build and the binary search LUSM function over and over again for the same key was inefficient, particularly when this lookup had just been performed and would have the same results.  SAFR has just found the Customer record with the Customer E-mail Address, and can use this same record to find the Customer Gender.  Thus SAFR has automatic Join Optimization built-in to every lookup. 

This is done through the Join function GOTO Rows.  Note that the GOTO Row 1 and Row 2 numbers on the Join Function are the same as the LUSM, the Found and Not Found rows.  Before the Join executes, it very efficiently test if this join has been performed for this record.  If it has, the JOIN immediately branches to either the Found or Not Found row based upon the last lookup.  This avoids using CPU cycles to do unnecessary work.

This optimization is not limited to repeat joins within a view, but operates across all joins for all views being run against the same source event file.  The greater the number of views and joins, the greater the efficiencies gained.

Slide 20

The last view specific logic table function is the WRDT, which Writes the Data section of the Extract record.  The record will include the DTE data for Order ID, the  DTL Customer Email Address or DTC spaces if there is not one, and the DTL Customer Gender Code or DTC spaces if that is missing.

Having examined the simple lookup logic table, let’s turn the Logic Table TRACE and examine the flow. 

Slide 21

For the first record from the Order File, rows 3 through 9 are executed in order. No rows are skipped.  (The HD and RENX functions of rows 1 and 2 are not shown in the trace). 

The Trace shows the DTE function moving the Order ID of 0000000001  from the Order Source Event File to the Extract record.

Slide 22

Because this is the first source event file record read, and the Join function on row 5 is the first join in the logic table, no join has been performed.  The Join performs a test to see if the join has been performed, and finding that it has not, the Join falls through to the LKE function to build the key for the join. No branching from the Join is performed.

Slide 23

Because this is the first Join, the LKE function is used to build a key to search for a matching customer record.  The trace shows the LKE value in the Customer ID field on Order file, which is moved to the lookup key buffer prior to doing the search.  The search LUSM function will search for customer number 1.

Slide 24

The results of the LUSM binary search can be seen by noting the next row executed.  In our example row 8 is the next row in the trace; from this it is clear that the LUSM function found a corresponding customer record with customer ID 1 in the Customer Reference File.  Now any Logic Table functions that reference Customer Logical Record ID 7 (called Record ID in the Trace Report) will use this record until another LUSM is performed for the Customer Logical Record.

Slide 25

The DTL Function moves looked up data from  the Customer Email Address field on the Customer Reference File to the Extract record.  The trace report shows a portion of the value that is moved to the extract record.

The GOTO function on Logic Table Row 9 then skips the DTC function which would have placed spaces in the Extract record if no Customer Record had been found. Remember that the trace only shows executed logic table rows.  Rows skipped are not printed in the trace.

Slide 26

The second Join is required for the Gender code.  It begins with the Join function testing to see if a join to Logical Record ID 7 has been performed previously.  Since the lookup for the Customer E-mail Address was just performed and resulted in a Found Condition, the LKE, LUSM functions do not need to be performed.  Rather, the GOTO Row 1 on the Join is used, just like the second LUSM had been performed and found Customer ID 1 again in the Customer record.  In this particular logic table, logic table rows 12 and 13 will likely never be executed during processing. 

Slide 27

The 2nd DTL Function moves the Customer Gender code from the Customer Reference File to the Extract record.  The trace report shows the “M” value that is moved to the extract record.  The GOTO function on Logic Table Row 16 then skips the DTC function which would have placed spaces in the Extract record if no Customer Record had been found.

The WRDT function writes this completed Extract Record to the Extract File.  Execution then resumes at the RENX function for the next Event Record.

Slide 28

Processing for the next four records continues following the same execution pattern as record number 1.  The only changes in the trace are the Event Record number, and the Values on the DTE, LKE, and both DTL functions.  The values from the Order and Customer files are moved to the Extract File.

Slide 29

The last two Orders in our example Source Event File do not have corresponding Customer records.  Thus on these two records, the first LUSM functions result in Not Found conditions, and branch to the DTC functions to populate the extract record Customer Email Address with the default value of spaces.  Likewise, the following Join functions detect that a Not Found occurred on the lookup, and also branch to the DTC functions to populate the extract record with a default space for the Customer Gender code.

We can manually perform the search function by taking the value shown in the LKE function, the “0000000000 “ and searching the Customer Reference File and find there is no Customer record for Customer ID “0000000000”.  There can be numerous causes for this, such as non-SAFR issues of mis-matches in the Order and Customer Reference files. 

If all lookups fail for a particular reference file it is likely a structural problem.  For example, the Join LR may be wrong.  Inspecting the RED can highlight this problem.  If data displayed in the LKE functions is incorrect, either the Event LR or the path may be incorrect.  Partially found lookups may indicate a data issue external to the SAFR configuration. 

Slide 30

The Extract Engine GVBMR95 control report shows the results of processing.  Twelve records were read from the input file, and twelve records written to the output file.  For each view against each event file, it reports the total lookups found and not found.  The process required 12 total lookups, 10 of which found corresponding records, and 2 which did not.

Note that these counts are the results of LUSM functions, and do not include Join optimizations.  If Join optimization had not been performed, the total number of lookups would have been 24, two look up for each extract record written times 12 extract records. 

Slide 31

There are many advantages to SAFR’s Lookups processes, allowing it to perform millions of joins or look-ups each CPU second.  The following are some reasons for this:

  • Memory based lookups are more efficient and rapid than lookups to data on a disk, but requires efficient memory usage
  • Join optimization and single-pass architecture allows reuse of lookups, including across different views
  • All reference data is loaded once, even when performing parallel processing; multiple threads share the same table in memory

Slide 32

This logic table and module has introduced the following Logic Table Function Code:

  • JOIN, Identifies join targets and test for executed Joins
  • LKE, builds a look up key using data from the source Event record
  • LUSM, binary searches of a in memory reference data table using the Lookup Key
  • DTL, moves a looked input field to the output buffer

Slide 33

This module described single step lookup processes. Now that you have completed this module, you should be able to:

  • Describe the Join Logic Table
  • Describe the advantages of SAFR’s lookup techniques
  • Read a Logic Table and Trace with lookups
  • Explain the Function Codes used in single step lookups
  • Debug a lookup not-found condition

Slide 34

Additional information about SAFR is available at the web addresses shown here. This concludes Module 13, Single Step Lookups

Slide 35