Crawler Server Installation and Configuration

This document describes installation and configuration of Crawler Server.

Installation

Download latest binary release (ZIP file) and extract it to some directory, such as /opt/big-data-crawler-1.0.0

Running

Run crawler-server in bin directory without any parameters to print usage information.

/opt/big-data-crawler-1.0.0/bin/crawler-server
Usage: crawler-server <options>

Commands:
  -c <config file>   Start Crawler server
  -V, --version      Print Crawler version

Optional parameters:
  -l <file>    Log file. Default is /tmp/crawler/crawler.log
  -v <level>   Logger verbosity: DEBUG / ALL, INFO (default), WARN, ERROR

To start the server, run

/opt/big-data-crawler-1.0.0/bin/crawler-server -c /opt/big-data-crawler-1.0.0/conf/crawler-server.cfg

Configuration

Example configuration file is located in the installation directory (/conf/crawler-server.cfg). The following parameters are available.

Message Broker Parameters

Parameter Description
mq.type Message broker type. Currently only "RabbitMQ" is supported. We can add more types in future releases.
rmq.host RabbitMQ "host:port" tuples (one tuple per line). For example, "localhost:5672".
rmq.user RabbitMQ user. For example, "harvest".
rmq.password RabbitMQ password. For example, "harvest1234".

Other Parameters

Parameter Description
web.port Embedded web server port. Default value is 8001.