June 2, 2013

Pentaho Data Integration - Improving Performance - Increase the Java Memory (#019)

Pentaho runs inside a Java Virtual Machine, and hence is bound by the properties of that VM.
These optimisations can apply to just about any Java application, including the Pentaho BI Server and GUI tools.

Method to increase memory allocation:

Open the file spoon.sh or Spoon.bat in a text editor. Look for a section that looks like this:

# ******************************************************************
# ** Set java runtime options                                     **
# ** Change 256m to higher values in case you run out of memory.  **
# ******************************************************************


OPT="-Xmx256m -cp $CLASSPATH -Djava.library.path=$LIBPATH -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES"

Change the -Xmx parameter to alter the maximum heap size, i.e.: -Xmx1024m

Source: http://djugal.blogspot.ca/2011/07/increase-java-memory-for-pentaho-data.html
Posted by Jugal Dhrangadharia

Pentaho Data Integration - Run from the Windows cmd (#018)

Run a transformation from the Windows cmd
Run a transformation from the Windows command line
Run a transformation from shell
Run a job from the Windows cmd
Run a job from the Windows command line
Run a job from shell

Running a transformation

To run a job from the cmd you need to use the "Pan" batch file. The script is located in the main Pantaho folder.


Create the "execute_from_cmd" folder in the "Repository explorer."


Create a dummy transformation.






Create a dummy job.





Run the job to test it.



The information needed to run the transformation can be found in the "Repository Connection" window.


To get the details edit the repository.


The Information how to run a Pentaho process can be found on the Pentaho Wiki pages:
http://wiki.pentaho.com/display/EAI/Pan+User+Documentation


Run the dummy transformation:

Pan.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /trans:tr_dummy /level:Detailed

/rep:   - a repository
/user:  - the repository user name
/user:  - the repository password
/dir:   - the repository directory
/trans: - the repository transformation to run
/level: - logging level



To write the results to the log file use the "/log" clause:

Pan.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /trans:tr_dummy /level:Detailed /log:C:\a_example_log_file.log

/rep:   - a repository
/user:  - the repository user name
/user:  - the repository password
/dir:   - the repository directory
/trans: - the repository transformation to run
/level: - logging level
/log:   - the logging file




Running a job

To run the job from the cmd you need to use the "Kitchen" batch file. The script is located in the main Pantaho folder.



The Information how to run the a Pentaho job can be found on the Pentaho Wiki pages:

or the Infocenter pages:
http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_user_guide%2Freference_kitchen.html


Run the dummy job:

Kitchen.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /job:jb_dummy /level:Basic



This time to write the results to the log file redirect the output

Kitchen.bat /rep:penrep_id /user:admin /pass:admin /dir:/execute_from_cmd /job:jb_dummy /level:Basic > C:\a_example_log_file.log