Harvest Server Installation and Configuration

This document describes installation and configuration of Harvest Server.

Installation

Download latest binary release (ZIP file) and extract it to some directory, such as /opt/big-data-harvest-1.0.0

Running

Run harvest-server in bin directory without any parameters to print usage information.

/opt/big-data-harvest-1.0.0/bin/harvest-server
Usage: harvest-server <options>

Commands:
  -c <config file>   Start Harvest server
  -V, --version      Print Harvest version

Optional parameters:
  -l <file>    Log file. Default is /tmp/harvest/harvest.log
  -v <level>   Logger verbosity: DEBUG / ALL, INFO (default), WARN, ERROR

To start the server, run

/opt/big-data-harvest-1.0.0/bin/harvest-server -c /opt/big-data-harvest-1.0.0/conf/harvest-server.cfg

Configuration

Example configuration file is located in the installation directory, (/conf/harvest-server.cfg). The following parameters are available.

Message Broker Parameters

Parameter Description
mq.type Message broker type. Currently only "RabbitMQ" is supported. We can add more types in future releases.
rmq.host RabbitMQ "host:port" tuples (one tuple per line). For example, "localhost:5672".
rmq.user RabbitMQ user. For example, "harvest".
rmq.password RabbitMQ password. For example, "harvest1234".

Registry (Elasticsearch) Parameters

Parameter Description
es.url Elasticsearch (Registry) URL. For example, "http://localhost:9200".
es.index Elasticsearch (Registry) index name. For example, "registry".
es.authFile Optional parameter. Elasticsearch authentication file. For example, "/etc/pds-registry/auth.cfg"

Other Parameters

Parameter Description
web.port Embedded web server port. Default value is 8005.
harvest.storeLabels Optional parameter. Store original PDS labels (XML) as BLOBs. Default value is "true".
harvest.storeJsonLabels Optional parameter. Store PDS labels in JSON format as BLOBs. Default value is "true".
harvest.processDataFiles Optional parameter. Extract basic file information and calculate MD5 hashes of all data files referenced in a PDS label.