Scalable Harvest – Harvest Client Operation

Harvest Client Operation

Quick Start

Configuration File

Harvest client requires message broker (RabbitMQ) connection to submit jobs to the Harvest server cluster. Default configuration file, <INSTALL_DIR>/conf/harvest-client.cfg, has the following parameters:

# Message server type. Currently, only 'RabbitMQ' is supported.
mq.type = RabbitMQ

# RabbitMQ host(s). One or more host:port tuples (one tuple per line).
rmq.host = localhost:5672

# RabbitMQ user
rmq.user = harvest

# RabbitMQ password
rmq.password = harvest

You may need to update RabbitMQ host, user and password.

View Help

To print usage information, run harvest-client without any parameters:

<INSTALL_DIR>/bin/harvest-client

Usage: harvest-client <command> <options>

Commands:
  harvest         Submit new harvest job
  -V, --version   Print Harvest Client version

Options:
  -v <value>   Log verbosity: DEBUG, INFO, WARN, ERROR. Default is INFO.
  -help        Pass -help after any command to see command-specific usage information, for example,
               harvest-client harvest -help

To see help for the "harvest" command, run:

<INSTALL_DIR>/bin/harvest-client harvest -help

Usage: harvest-client harvest <options>

Submit new harvest job

Required parameters:
  -j <path>   Harvest job file

Optional parameters:
  -c <path>    Harvest Client configuration file. Default is $HARVEST_CLIENT_HOME/conf/harvest.cfg
  -overwrite   Overwrite registered products

Submit a Job

To submit a job to the harvest server cluster you need a job configuration file. An example configuration file is available in the installation directory: <INSTALL_DIR>/examples/directories.xml.

You will need to update the nodeName:

<harvest nodeName="PDS_ATM">

The path to the data:

  <directories>
    <path>/data/OREX/orex_spice</path>
  </directories>

And the URL prefix for the data:

  <fileInfo>
    <!-- UPDATE with your own local path and base url where pds4 archive are published -->
    <fileRef replacePrefix="/data" with="https://pds-atmospheres.nmsu.edu/" />
  </fileInfo>

If you save this file as /tmp/job1.xml and run Harvest Client

<INSTALL_DIR>/bin/harvest-client harvest -j /tmp/job1.xml

You should see output similar to this:

[INFO] Reading job from /tmp/job1.xml
[INFO] Reading configuration from /tmp/big-data-harvest-client-1.0.0-SNAPSHOT/conf/harvest-client.cfg
[INFO] Creating new job...
[INFO] Connecting to RabbitMQ
[INFO] Created job f282a012-115e-429c-b445-f5eed1d81303

Next Steps

Monitor RabbitMQ, Crawler, Harvest, and Elasticsearch servers for progress of your job.