Class DataLoader

java.lang.Object
gov.nasa.pds.registry.common.es.dao.DataLoader

public class DataLoader extends Object
Loads data from an NJSON (new-line-delimited JSON) file into Elasticsearch. NJSON file has 2 lines per record: 1 - primary key, 2 - data record. This is the standard file format used by Elasticsearch bulk load API. Data are loaded in batches.
Author:
karpenko
  • Constructor Details

    • DataLoader

      public DataLoader(ConnectionFactory conFactory) throws Exception
      Constructor
      Parameters:
      conFactory - instance of class gov.nasa.pds.registry.common.ConnectionFactory
      Throws:
      Exception - an exception
  • Method Details

    • setBatchSize

      public void setBatchSize(int size)
      Set data batch size
      Parameters:
      size - batch size
    • loadFile

      public void loadFile(File file) throws Exception
      Load data from an NJSON (new-line-delimited JSON) file into Elasticsearch.
      Parameters:
      file - NJSON (new-line-delimited JSON) file to load
      Throws:
      Exception - an exception
    • loadZippedFile

      public void loadZippedFile(File zipFile, String fileName) throws Exception
      Load data from a zipped NJSON (new-line-delimited JSON) file into Elasticsearch.
      Parameters:
      zipFile - Zip file with an NJSON data file.
      fileName - NJSON data file name in the Zip file.
      Throws:
      Exception - an exception
    • loadBatch

      public int loadBatch(List<String> data, Set<String> errorLidvids) throws Exception
      Load data into Elasticsearch
      Parameters:
      data - NJSON data. (2 lines per record)
      errorLidvids - output parameter. If not null, add failed LIDVIDs to this set.
      Returns:
      Number of loaded documents
      Throws:
      Exception - an exception
    • loadBatch

      public int loadBatch(List<String> data, Set<String> errorLidvids, int retries) throws Exception
      Load data into Elasticsearch
      Parameters:
      data - NJSON data. (2 lines per record)
      errorLidvids - output parameter. If not null, add failed LIDVIDs to this set.
      retries - number of times to retry the request if an exception is thrown.
      Returns:
      Number of loaded documents
      Throws:
      Exception - an exception
    • loadBatch

      public int loadBatch(List<String> data) throws Exception
      Load data into Elasticsearch
      Parameters:
      data - data NJSON data. (2 lines per record)
      Returns:
      Number of loaded documents
      Throws:
      Exception - an exception