February 12, 2013

Pentaho Data Integration - Loop (#008)


In the repository, create a new folder called "loop" with a subfolder "loop_transformations".


In the "loop" folder, create:
- job: jb_loop

In the "loop_transformations" subfolder,create the following transformations:

- tr_loop_pre_employees
- tr_loop_new_employees_load
- tr_loop_new_employees_to_file








The process logic looks as follows:


The "jb_loop" structure looks as follows:


Adjust the "Set Variables" step:
Variable name: iteration
Value: 0
Variable scope type: Valid in the current job


Double-click the "Transformation" step and point it to: "/loop/loop_transformations/tr_loop_pre_employees_load"


Edit the JavaScript step and past in the shot script


Script:
var iteration = parseInt(parent_job.getVariable("iteration"))
var new_iteration = iteration + 1
parent_job.setVariable("iteration", new_iteration);
new_iteration < 3;

"Write To Log" step is very usefull if you want to add important messages to log information. I will be seen depending on a log level.

Comment:
After three iterations the process failed


The "jb_loop" structure"


Edit the "Write To Log 2" step and add a comment on the iteration number.


Edit the "Write To Log 3" step and add a comment that the iiteration was successful.


Point "Transformation 2" to "tr_loop_new_employees_load"


Point "Transformation 3" to "tr_loop_new_employees_to_file"


The "jb_loop" structure"


In order to adjust the "tr_loop_pre_employees_load" transformation,the following elements are required:
- Get Variables
- Switch / Case
- Abort
- Table input
- Table output

Adjust the "Get Variables" step in the following way:


Adjust the "Switch / Case" step in the following way:


Abort does not need to be adjusted.

Adjust the "Table input" step in the following way:


Adjust the "Table output" step in the following way:


Click the "SQL" button to create a table in the "dwh" schema.


Check "Specify database fields".


Go to the "Database fields" tab and click the "Get fields" button


Delete the first row entry "interation_in".


Connect steps as in the screenshot:


The point of using the "Switch / Case" and "Abort" steps is to proove that after first failure looping is working and second run works fine.

"tr_loop_new_employees_load"

That transformation is simple. We want to copy data from "pre_employees" to "new_employees" both in the "dwh" schema.



Remove first row - "iteration_in"



"tr_loop_new_employees_to_file"
- "Table input"
- "Text file output"

"Table input":


"Table output"




Click "Fields" tab then "Get fields" and at the end "Minimal width" button.
"Get fields" will inherit fields from source table.
"Minimal width" will trim the strings.


Let's run the "jb_loop" job and check results





Log information:
2013/01/28 12:03:58 - tr_loop_pre_employees_load - Loading transformation from repository [tr_loop_pre_employees_load] in directory [/loop/loop_transformations]
2013/01/28 12:03:59 - tr_loop_pre_employees_load - Dispatching started for transformation [tr_loop_pre_employees_load]
2013/01/28 12:03:59 - tgt_pre_employees.0 - Connected to database [conn_oracle11g_orcl_dwh_dwh] (commit=1000)
2013/01/28 12:03:59 - get_variables_iteration.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2013/01/28 12:03:59 - Abort.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : Row nr 1 causing abort : [ 0]
2013/01/28 12:03:59 - Abort.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : Aborting after having seen 1 rows.
2013/01/28 12:03:59 - Abort.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=1)
2013/01/28 12:03:59 - tr_loop_pre_employees_load - tr_loop_pre_employees_load
2013/01/28 12:03:59 - tr_loop_pre_employees_load - tr_loop_pre_employees_load
.
.
.
2013/01/28 12:03:59 - tr_loop_pre_employees_load - tr_loop_pre_employees_load
2013/01/28 12:03:59 - tr_loop_pre_employees_load - tr_loop_pre_employees_load
2013/01/28 12:03:59 - jb_loop - Starting entry [JavaScript]
2013/01/28 12:03:59 - jb_loop - Starting entry [write_to_log_iteration]
2013/01/28 12:03:59 - Loop - Iteration number 1
2013/01/28 12:03:59 - jb_loop - Starting entry [tr_loop_pre_employees_load]
2013/01/28 12:03:59 - tr_loop_pre_employees_load - Loading transformation from repository [tr_loop_pre_employees_load] in directory [/loop/loop_transformations]
2013/01/28 12:04:00 - tr_loop_pre_employees_load - Dispatching started for transformation [tr_loop_pre_employees_load]
2013/01/28 12:04:00 - tgt_pre_employees.0 - Connected to database [conn_oracle11g_orcl_dwh_dwh] (commit=1000)
2013/01/28 12:04:00 - get_variables_iteration.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2013/01/28 12:04:00 - switch_case_iteration_in.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2013/01/28 12:04:00 - src_employees.0 - Finished reading query, closing connection.
2013/01/28 12:04:00 - src_employees.0 - Finished processing (I=107, O=0, R=0, W=107, U=0, E=0)
2013/01/28 12:04:00 - tgt_pre_employees.0 - Finished processing (I=0, O=107, R=107, W=107, U=0, E=0)
2013/01/28 12:04:00 - jb_loop - Starting entry [write_to_log_process_success]
2013/01/28 12:04:00 - Loop - The looped step succeed
.
.
.
2013/01/28 12:04:01 - jb_loop - Job execution finished
2013/01/28 12:04:01 - Spoon - Job has ended.

No comments:

Post a Comment