PL/dwh: 2013

August 31, 2013

AutoMate 9 vs. Windows Task Scheduler 2.0 (#021)

Source:
http://www.networkautomation.com/documents/507363d10f838452330925.pdf

Free to try (30-day trial); $1,495.00 to buy

Price Source:
http://download.cnet.com/AutoMate/3000-2084_4-10000220.html

Getting Started With AutoMate 9

Pentaho Data Integration - Fundamental Tutorial (Video) (#020)

About This Video

Comprehensive Pentaho Data Integration Tutorial
Creating a Job and Transformation.
Creating a simple Multi-Dimensional model
Logging and Performance Metrics
Scheduling and Running Remotely with Carte
Using with the BI Platform and an Action Sequence
Using the PDI Console
Notification

June 2, 2013

Pentaho Data Integration - Improving Performance - Increase the Java Memory (#019)

Pentaho runs inside a Java Virtual Machine, and hence is bound by the properties of that VM.
These optimisations can apply to just about any Java application, including the Pentaho BI Server and GUI tools.

Method to increase memory allocation:

Open the file spoon.sh or Spoon.bat in a text editor. Look for a section that looks like this:

# ******************************************************************
# ** Set java runtime options **
# ** Change 256m to higher values in case you run out of memory. **
# ******************************************************************

OPT="-Xmx256m -cp $CLASSPATH -Djava.library.path=$LIBPATH -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES"

Change the -Xmx parameter to alter the maximum heap size, i.e.: -Xmx1024m

Source: http://djugal.blogspot.ca/2011/07/increase-java-memory-for-pentaho-data.html
Posted by Jugal Dhrangadharia

Pentaho Data Integration - Run from the Windows cmd (#018)

Run a transformation from the Windows cmd
Run a transformation from the Windows command line
Run a transformation from shell
Run a job from the Windows cmd
Run a job from the Windows command line
Run a job from shell

Running a transformation

To run a job from the cmd you need to use the "Pan" batch file. The script is located in the main Pantaho folder.

Create the "execute_from_cmd" folder in the "Repository explorer."

Create a dummy transformation.

Create a dummy job.

Run the job to test it.

The information needed to run the transformation can be found in the "Repository Connection" window.

To get the details edit the repository.

The Information how to run a Pentaho process can be found on the Pentaho Wiki pages:
http://wiki.pentaho.com/display/EAI/Pan+User+Documentation

Run the dummy transformation:

Pan.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /trans:tr_dummy /level:Detailed

/rep: - a repository
/user: - the repository user name
/user: - the repository password
/dir: - the repository directory
/trans: - the repository transformation to run
/level: - logging level

To write the results to the log file use the "/log" clause:

Pan.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /trans:tr_dummy /level:Detailed /log:C:\a_example_log_file.log

/rep: - a repository
/user: - the repository user name
/user: - the repository password
/dir: - the repository directory
/trans: - the repository transformation to run
/level: - logging level
/log: - the logging file

Running a job

To run the job from the cmd you need to use the "Kitchen" batch file. The script is located in the main Pantaho folder.

The Information how to run the a Pentaho job can be found on the Pentaho Wiki pages:

http://wiki.pentaho.com/display/EAI/Kitchen+User+Documentation#KitchenUserDocumentation-LaunchingKitchen

or the Infocenter pages:

http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_user_guide%2Freference_kitchen.html

Run the dummy job:

Kitchen.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /job:jb_dummy /level:Basic

This time to write the results to the log file redirect the output

Kitchen.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /job:jb_dummy /level:Basic > C:\a_example_log_file.log

May 13, 2013

Pentaho Data Integration - Connection - Oracle 11g R2 RAC (#017)

To run Pentaho you need to install Java first. Required Java version can be found in the run batch or shell script. Both files, Spoon.bat and spoon.sh, are located in the main Pentaho folder.

The next step is selecting of a proper JDBC driver which is associated with the Oracle and Java version.

JDBC drivers can be found on the Oracle's website:

http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html

Download the driver, i.e.: ojdbc6.jar and place it in the following directory:

\data-integration\libext\JDBC

Naming convention:

o - Oracle

jdbc - Java database connectivity

6 - Java 6 (1.6)

Information about the connection string can be found on the Penaho's wiki website:

http://wiki.pentaho.com/display/EAI/Oracle

Use one of the examples or copy a section from the tnsnames.ora file:

Place the connection string in the "Database Name" field, the "Port Number" can be omitted or set to "-1".

Press the "Test" button.

Pentaho Data Integration - Connection - SQL Server 2012 Cluster (#016)

To run Pentaho you need to install Java first. Required Java version can be found in the run batch or shell script. Both files, Spoon.bat and spoon.sh, are located in the main Pentaho folder.

The next step is selecting of a proper JDBC driver which is associated with SQL Server and Java version.

Download the native jdbc driver for the SQL Server 2012 from the Microsoft's website:

http://msdn.microsoft.com/en-US/sqlserver/aa937724.aspx

Download the installation file, i.e.: sqljdbc_4.0.2206.100_enu.exe, unpack and place the driver in the following directory:

\data-integration\libext\JDBC

Naming convention:

sql - SQL Server

jdbc - Java database connectivity

4.0 - type 4, suitable for Java EE 5 and 6 (1.5 and 1.6)

Information about the connection string can be found on one of the msdn's sites:

http://msdn.microsoft.com/en-us/library/ms378428.aspx

The "Database Connection" window should look in the following way:

The "Feature List" window will look in the following way:

Run the test.

May 7, 2013

Pentaho Data Integration - Align / Snap to Grid (#015)

There are two ways to organize a transformation’s steps on the canvas:

- to align with the keyboard shortcut keys,

- to use “grid” and “snap to grid” functionality.

Shortcut keys

Source: http://wiki.pentaho.com/display/EAI/.12+Graphical+View#.12GraphicalView-KeyboardShortcuts

Section: Keyboard Shortcuts

Snap to grid

Source: http://wiki.pentaho.com/display/EAI/What's+new+in+PDI+version+3.1

Section: Snap to grid

Pentaho Data Integration - Improving Performance - Table Input and Oracle (#014)

When using Pentaho Data Integration Table Input step to connect to Oracle via a JDBC connection there is a setting in your connection information that you can specify that can dramatically improve your performance in retrieving data. This property is the defaultRowprefetch. Oracle JDBC drivers allow you to set the number of rows to prefetch from the server while the result set is being populated during a query. Prefetching row data into the client reduces the number of round trips to the server. The default value for this property is 10.

In the table input step, edit your connection, click on the options tab and then enter in your defaultRowPrefetch specification: