Harvest Client Operation
Quick Start
Configuration File
Harvest client requires message broker (RabbitMQ) connection to submit jobs to the Harvest server cluster. Default configuration file, <INSTALL_DIR>/conf/harvest-client.cfg, has the following parameters:
# Message server type. Currently, only 'RabbitMQ' is supported. mq.type = RabbitMQ # RabbitMQ host(s). One or more host:port tuples (one tuple per line). rmq.host = localhost:5672 # RabbitMQ user rmq.user = harvest # RabbitMQ password rmq.password = harvest
You may need to update RabbitMQ host, user and password.
View Help
To print usage information, run harvest-client without any parameters:
<INSTALL_DIR>/bin/harvest-client
Usage: harvest-client <command> <options> Commands: harvest Submit new harvest job -V, --version Print Harvest Client version Options: -v <value> Log verbosity: DEBUG, INFO, WARN, ERROR. Default is INFO. -help Pass -help after any command to see command-specific usage information, for example, harvest-client harvest -help
To see help for the "harvest" command, run:
<INSTALL_DIR>/bin/harvest-client harvest -help
Usage: harvest-client harvest <options> Submit new harvest job Required parameters: -j <path> Harvest job file Optional parameters: -c <path> Harvest Client configuration file. Default is $HARVEST_CLIENT_HOME/conf/harvest.cfg -overwrite Overwrite registered products
Submit a Job
To submit a job to the harvest server cluster you need a job configuration file. An example configuration file is available in the installation directory: <INSTALL_DIR>/examples/directories.xml.
You will need to update the nodeName:
<harvest nodeName="PDS_ATM">
<directories> <path>/data/OREX/orex_spice</path> </directories>
<fileInfo> <!-- UPDATE with your own local path and base url where pds4 archive are published --> <fileRef replacePrefix="/data" with="https://pds-atmospheres.nmsu.edu/" /> </fileInfo>
If you save this file as /tmp/job1.xml and run Harvest Client
<INSTALL_DIR>/bin/harvest-client harvest -j /tmp/job1.xml
You should see output similar to this:
[INFO] Reading job from /tmp/job1.xml [INFO] Reading configuration from /tmp/big-data-harvest-client-1.0.0-SNAPSHOT/conf/harvest-client.cfg [INFO] Creating new job... [INFO] Connecting to RabbitMQ [INFO] Created job f282a012-115e-429c-b445-f5eed1d81303
Next Steps
Monitor RabbitMQ, Crawler, Harvest, and Elasticsearch servers for progress of your job.