Operation

The Registry is an Apache Solr adaptation that includes the Solr Admin interface along with the pre-installed collections that come with the PDS Registry and Search. Once it is deployed, operations include modifying the configuration and leveraging the PDS and PDAP protocols to query data and metadata that has been extracted from PDS4 data products. The following topics will be covered below:


Tools

Solr Admin Interface

The Solr Administration Interface is available for advanced configuration and debugging of the Solr instance. The page can be viewed at http://localhost:8983/solr or if you have setup the proxy pass as directed in the installation guide, https://<hostname>:8983/services/registry. The following resources are available detailing the how to use this interface:

From the query page, you can perform various queries against the available end points supported by the services. See registry/collections/*/solrconfig.xml for the different possible endpoints. Details regarding the supported query parameters can be found on the Common Query Parameters page of the Solr wiki.

Registry Manager Tool

This utility currently on wraps the Solr Post Tool to post and delete data from the PDS Registry collections. Future improvements to this tool may allow for more extensive interaction with the Registry from the command-line.

Current functionality includes:

  • Post Harvest-generated Solr Documents to the searchable 'data' collection
  • Delete individual Registry packages (aka Harvest runs) from Registry collections
  • Delete all documents from all collections.

Usage

$ ./registry-mgr -h

Usage: ./registry-mgr [OPTIONS] [SOLR-DOCS]

Options:
  -h                              Print Help
  -host <host>                    Solr server host (default: localhost)
  -port <port>                    Solr server port (default: 8983)
  -delete-pkg <package_id>        Delete a specific Harvest ingestion package
  -delete-all                     Delete all data from all Registry collections

Optional Parameters:
  SOLR-DOCS     solr-docs directory or an xml file
        

Registry Solr Collections

Registry Collection

Endpoint: http://localhost:8983/solr/registry/ (replace host/port as needed)

Description: The Registry collection is used as the base blob store for all PDS product labels. It includes some basic metadata about the product label, like the MD5 checksum, Registry Package ID (the unique identifier for each Harvest run), etc., with plans to expand this collection to include more important archival metadata.

XPath Collection

Endpoint: http://localhost:8983/solr/xpath/ (replace host/port as needed)

Description: The XPath collection is used as a quick-look of the label metadata ingested into the Registry Collection. It is a nice way to have a look at the data in your archive, as a well as a means for informing how you may want to augment your 'data' collection.

Data Collection

Endpoint: http://localhost:8983/solr/data/ (replace host/port as needed)

Description: The Data collection is intended to be the more search-friendly metadata store. It is used to refine the "kitchen sink" of metadata in all of the labels, into a refined set of search fields that will enable a more accurate, usable search capability. This collection will eventually be utilized by the PDS Search API efforts to provide an integrated approach to metadata search across the PDS.


Common Operations

Here are some common operations you may wish to perform to interact with the Registry and Search. A tool to simplify these actions is in the plan, but in the meantime, here are the various raw commands you can do to modify data in your registry and search. For additional information for interacting with the Registry, please also take a look at the Solr Collections API and other info on interacting with Solr documents.

Add Data To Registry and Search

Run the Harvest Tool to populate the Registry with data.

When you run the Harvest Tool, basic metadata to capture archive information will be ingested into the Registry.

After you have run the Harvest Tool you should see one or more files like solr_doc_0.xml, solr_doc_1.xml under /path/to/registry/../registry-data/solr-docs (or on Windows C:\\path\to\registry\..\registry-data\solr-docs).

These files are needed to populate the data Solr collection, which will provide a more refined, custom indexing strategy to PDS product metadata, allowing it to be more easily searchable and leverage the PDS Search APIs. To ingest these files into Solr, execute the following command from either /path/to/harvest/bin or /path/to/registry/bin:

# Post Harvest solr-docs to http://localhost:8983 by default
% ./registry-mgr /path/to/registry/../registry-data/solr-docs

# Post Harvest solr-docs to a Solr instance on a different host and port
% ./registry-mgr -h example.com -p 9000 /path/to/registry/../registry-data/solr-docs

        

Note: This only works in a POSIX-compatible environment. For Windows users that cannot support a POSIX-compatible environment, you can execute curl commands on each solr-docs/solr_doc_*.xml file like:

$ curl 'http://localhost:8983/solr/data/update/' --data-binary @C:\\path\to\solr-docs\solr_doc_0.xml
        

Delete Data From Registry and Search

Once you have data in the Registry and/or Search, you can delete data from the Registry collections using the following:

Using Registry Manager Tool:

# Delete a specific Registry Package (aka Harvest run) from all collections
./registry-mgr -delete-pkg 86037219-aa7d-4a70-a239-4501e0c675a4

# Delete all data from all collections
./registry-mgr -delete-all  
	      

Using curl (update host/port and as needed):

# Delete a specific Registry Package (aka Harvest run) from all collections
$ curl 'http://localhost:8983/solr/registry/update/' --data-binary "<delete><query>package_id:86037219-aa7d-4a70-a239-4501e0c675a4</query></delete>"
$ curl 'http://localhost:8983/solr/xpath/update/' --data-binary "<delete><query>package_id:86037219-aa7d-4a70-a239-4501e0c675a4</query></delete>"
$ curl 'http://localhost:8983/solr/data/update/' --data-binary "<delete><query>package_id:86037219-aa7d-4a70-a239-4501e0c675a4</query></delete>"

# Delete all data from all collections
$ curl 'http://localhost:8983/solr/registry/update/' --data-binary "<delete><query>*:*</query></delete>"
$ curl 'http://localhost:8983/solr/xpath/update/' --data-binary "<delete><query>*:*</query></delete>"
$ curl 'http://localhost:8983/solr/data/update/' --data-binary "<delete><query>*:*</query></delete>"
	      

Re-Ingest Data To Registry and Search

  • Delete the data you would like to remove from Registry and Search.
  • If you removed data from both the Registry and Search, you will need to re-run Harvest and then re-ingest the data into the Search index.
  • If you only removed data from the Search index, you can then just re-ingest the data into the Search index.

Configuration

The Registry comes preconfigured for supporting common PDS search terms and facets. That said, it can be tailored to support Discipline Node specific search requirements. See the Apache Solr website and wiki for more information on configuring Apache Solr. The sub-sections that follow detail the steps for performing some common configuration changes.

Add a Field to the Schema

In order to add a field to the schema, modify the $REGISTRY_HOME/conf/schema.xml file. The following example will add the field my_field:

<field name="my_field" type="text_general" indexed="true" stored="false" \
multiValued="false" />
        

See the Schema XML page of the Solr wiki for more information on update the Solr schema.


Search Protocols

The Registry provides two protocols for searching the contents of the service.

PDS Search Protocol

See the PDS Search Protocol document for more information on search terms and result formatting provided by this protocol. If viewing this document from the registry-mgr-legacy package, view the PDS Search Protocol document from the Engineering Node site.

PDAP Protocol

See the PDAP Search Protocol document for more information on search terms and result formatting provided by this protocol. If viewing this document from the registry-mgr-legacy package, view the PDAP Search Protocol document from the Engineering Node site.


Common Errors

TBD