January 27, 2013

Pentaho Data Integration – Variables (#006)


Pentaho Data Integration – How to Use a “Set Variables” step.
Pentaho Data Integration – How to Replace Variables in a SQL statement


Explore the repository and create a new folder "employees_incremental_load"



Inside "employees_incremental_load" create a "eil_transformations" subfolder.


Create three new objects:
- job: "jb_employees_incremental_load"
- transformation "tr_eil_dates"
- transformation "tr_eil_new_employees_load"

The main job naming convention:
jb_[folder_name], i.e.: "jb_employees_incremental_load"

The job will be saved in the main folder: "employees_incremental_load"
The transformations will be saved in the subfolder "eil_transformations"






The ETL process will work as follows:
The main job will include two transformations. First transformation will pass a date value (by "Set Variables" step) to the second transformation. The second transformation will receive the data value and pass it as a parameter to the SELECT statement.


The "tr_eil_dates" transformation

Add two steps to the workspace area:
- From the "Input" folder "Table input"
- From the "Job" folder "Set Variables"

Adjust the "Input table" step:


SQL:
SELECT '2007-01-01' AS date_value FROM DUAL

Edit the "Set Variables" step" and click "Get Fields"


The transformation structure:


The "tr_eil_new_employees_load" transformation:

Add two steps to the workspace area:
- From the "Input" folder "Table input"
- From the "Output" folder "Table output"

Adjust the "Table input" step:

Remember to check the "Replace variables in script" option.

SQL:
SELECT
  EMPLOYEE_ID
, FIRST_NAME
, LAST_NAME
, EMAIL
, PHONE_NUMBER
, HIRE_DATE
, JOB_ID
, SALARY
, COMMISSION_PCT
, MANAGER_ID
, DEPARTMENT_ID
FROM EMPLOYEES
WHERE HIRE_DATE < TO_DATE('${date_value_in}', 'YYYY-MM-DD')

Adjust the "Table output" step:


Click the "SQL" button to create the "new_employees" table in the "dwh" schema


Variables in the script are causing troubles.

Comment it out in order to create a table. After the table is created return to the previous form.



Right-click anywhere in the workspace/canvas and select "Transformation settings"


In the "Transformation properties" window select the "Parameters" tab and add a "date_value_in" parameter without a default value.


The transformation structure:


The "jb_employees_incremental_load" job:


From the "General" folder add three steps to the workspace area:
- "START" and two "Transformations"

Adjust the "Transformation" in the following way:


Adjust the "Transformation 2" in the following way:


The job has the following structure:


Run and review the job’s results.





No comments:

Post a Comment