📦 Installation

This section describes how to install the PDS Data Upload Manager (DUM) Service.

Requirements

Prior to installing this software, ensure your system meets the following requirements:

  • Python 3.9 or above. Python 2 is not supported.

  • Terraform_ 1.0.11 or above. Note that Terraform is only required when deploying the server side components. It is not required to run the client-side script. For more information on deploying the server side components via Terraform, consult the terraform_ section of the documentation.

Consult your operating system instructions or system administrator to install the required packages. For those without system administrator access, you can use a local Python 3 installation using a virtual environment installation.

Installation Instructions

This section documents the installation procedure.

Installation

The easiest way to install this software is to use Pip, the Python Package Installer. If you have Python on your system, you probably already have Pip; you can run pip3 --help to check. Then run:

pip3 install pds-data-upload-manager

Note

The above command will install the latest approved release. To install a prior release, you can run:

pip3 install pds-data-upload-manager==<version>

The released versions are listed on https://pypi.org/project/pds-data-upload-manager/#history

If you want to use the latest unstable version, refer to the development documentation

If you don’t want the package dependencies to interfere with your local system you can use a virtual environment for your deployment. To do so:

mkdir -p $HOME/.venv
python3 -m venv $HOME/.venv/pds-data-upload-manager
source $HOME/.venv/pds-data-upload-manager/activate
pip3 install pds-data-upload-manager

At this point, the PDS DUM client script is available under $HOME/.venv/pds-doi-service/bin/pds-ingress-client.

Client Configuration

The PDS DUM client script utilizes an INI file for its configuration. While there is a default configuration file bundled with the service, to properly communicate with the server side components within AWS, end-users must provide their own INI configuration with the correct endpoints and user credentials filled in.

The following may be used as a template for a new INI configuration file:

[AWS]
profile = <AWS_Profile_Name>

[API_GATEWAY]
url_template = https://{id}.execute-api.{region}.amazonaws.com/{stage}/{resource}
id           = <API_Gateway_ID>
region       = us-west-2
stage        = <API_Gateway_Stage_Name>
resource     = request

[COGNITO]
client_id    = <Cognito_Client_ID>
username     = <Cognito_Username>
password     = <Cognito_Password>
region       = us-west-2

[OTHER]
log_level = INFO
log_format = "%(levelname)s %(threadName)s %(name)s:%(funcName)s %(message)s"
log_group_name = "<Cloudwatch_Log_Group_Name>"

Bracketed fields within the template correspond to values which need to be filled in by an end-user prior to using the pds-ingress-client script to transfer files to PDS. The remaining fields should be left as-is.

To obtain the correct values for <AWS_Profile_Name>, <API_Gateway_ID>, <API_Gateway_Stage_Name> and <Cognito_Client_ID>, <Cloudwatch_Log_Group_Name> contact a PDS Operator.

To obtain values for <Cognito_Username> and <Cognito_Password>, consult the section on User Registration within this document.

Running the Client script

Once the pds-data-upload-manager has been installed, you can run pds-ingress-client --help to get a usage message and ensure the client-side service is properly installed. You can also consult the usage_ documentation for more details.

Upgrading the Service

To check for and install an upgrade to the service, run the following command in your virtual environment:

pip install --upgrade pds-data-upload-manager

Note

An update to an existing virtualenv installation of the PDS DUM Service may fail if the underlying minimum required Python version has changed. If so, a new virtual environment should be created using the required version of Python, after which the latest version of the Service may be installed into it. Consult the installation instructions above on how to create a new virtual environment.

Configuring the Server side Bucket Map

Once the Server side components of DUM have been deployed to AWS (see terraform_ section), how ingested files are routed to S3 buckets is controlled via a “Bucket Map” configuration file which gets bundled with the “nucleus-dum-ingress-service” lambda function.

The format of the file is a simple YAML format file. An example bucket map is shown below:

MAP:
  NODES:
    ATM:
      default: pds-nucleus-dum
    ENG:
      default: pds-nucleus-dum
    GEO:
      default: pds-nucleus-dum
    IMG:
      default: pds-nucleus-dum
    NAIF:
      default: pds-nucleus-dum
    PPI:
      default: pds-nucleus-dum
    RMS:
      default: pds-nucleus-dum
    RS:
      default: pds-nucleus-dum
    SBN:
      gbo.ast.catalina.survey: pds-nucleus-staging
      default: pds-nucleus-dum

Within the mapping is are separate entries for each PDS Node which could make an ingress request via the client script. Within each Node section are one or more key/value mappings, where keys correspond to an expected path prefix of a file requested for ingest, and each value is the name of an S3 bucket where the file should be uploaded to.

In the above example, we can see that a default mapping is configured for all nodes that instructs the ingress lambda function to route all files to the pds-nucleus-dum bucket. This is the mapping that will be used when no other mapping for a path prefix exists.

Within the SBN section, we also see that a mapping from the gbo.ast.catalina.survey path prefix to the pds-nucleus-staging bucket is also defined. This means that any requests file paths that begin with gbo.ast.catalina.survey will be routed to the pds-nucleus-staging bucket during upload.

Note

The --prefix argument of the pds-ingress-client script can be instrumental to ensure that paths requested for ingress have a prefix that matches one of the mappings expected by the bucket config. Consult the usage page for the pds-ingress-client for more details on using the --prefix argument.

Should there ever be a need to make modifications to the bucket map used with a deployment of the DUM service, changes can be made to the file directly from within the AWS Console Lambda Code Source editor window. Be sure that the function is redeployed after any updates are made to the bucket map to ensure they take affect for subsequent ingress reqeusts.

Adding users to the AWS Cognito User Pool

Before the client-side script can be used to request ingest of files to PDS Cloud, a valid user account must exist in the AWS Cognito User Pool deployed with the rest of the DUM Server side components. Credentials for the user must then be provided in the INI config used with the pds-ingress-client script.

Currently, there are only two ways to configure new users within the User Pool: