User Guide
This guide explains the key concepts and components of Peppi to help you understand how to effectively search and access PDS data.
Key Concepts
PDS Products
The Planetary Data System organizes data into products. There are different types of products:
- Observational Products
The actual science data - images, spectra, measurements, etc. This is usually what you want.
- Collections
Groups of related products (e.g., all images from a particular Mars rover camera for a specific mission phase).
- Bundles
Groups of collections (e.g., all data from a complete mission).
- Context Products
Metadata describing targets (planets, moons), instruments, missions, and spacecraft. These help you search for the data you want.
Understanding PDS Identifiers
Every product in the PDS has a unique identifier called a LID (Logical Identifier). It looks like:
urn:nasa:pds:mission.instrument:collection:product
For example:
urn:nasa:pds:context:target:planet.mars
Some products also have versions, making them LIDVIDs (LID + Version ID):
urn:nasa:pds:mission.instrument:collection:product::1.0
You usually don’t need to know the exact LID - Peppi lets you search by name (like “Mars”) and it finds the LID for you.
Core Components
PDSRegistryClient
The PDSRegistryClient connects your Python code to the PDS API:
import pds.peppi as pep
client = pep.PDSRegistryClient()
By default, it connects to NASA’s production PDS server (https://pds.nasa.gov/api/search/1).
If you need to connect to a different server (e.g., for testing):
client = pep.PDSRegistryClient(base_url="https://pds.nasa.gov/api/search/1")
Products
The Products class is your main tool for searching. It uses a “fluent” interface where you chain methods together:
products = pep.Products(client) \
.has_target("Mars") \
.has_instrument_host("urn:nasa:pds:context:instrument_host:spacecraft.msl") \
.observationals()
Each method returns the Products object, so you can keep adding filters.
Context
The Context gives you access to searchable catalogs of targets and spacecraft:
context = pep.Context()
# Search for a target (with fuzzy matching!)
jupiter = context.TARGETS.search("jupiter")[0]
print(jupiter.name) # "Jupiter"
print(jupiter.lid) # "urn:nasa:pds:context:target:planet.jupiter"
# Search for a spacecraft
curiosity = context.INSTRUMENT_HOSTS.search("curiosity")[0]
print(curiosity.name) # "Mars Science Laboratory"
The search is typo-tolerant, so context.TARGETS.search("jupyter") will still find Jupiter!
Building Queries
The Query Builder Pattern
Peppi uses a “query builder” pattern. You start with Products(client) and add filters:
query = pep.Products(client) # Start with all products
query = query.has_target("Mars") # Filter by target
query = query.observationals() # Filter by product type
# Now execute by iterating
for product in query:
print(product.id)
Or chain it all together:
products = pep.Products(client).has_target("Mars").observationals()
Lazy Evaluation
Important: Queries don’t execute until you iterate over the results or convert to a DataFrame.
This means you can build up complex queries step by step:
# No API call happens yet
query = pep.Products(client).has_target("Mars")
# Still no API call
query = query.observationals()
# NOW the API is called, as we start iterating
for product in query:
print(product.id)
Automatic Pagination
When searching, the PDS API returns results in pages (typically 100 at a time). Peppi automatically handles this for you:
products = pep.Products(client).has_target("Mars").observationals()
# This will automatically fetch multiple pages as needed
for product in products:
print(product.id)
You don’t need to worry about pagination - just iterate and Peppi handles the rest!
Available Filters
Here are the main filtering methods you can use:
By Target
Filter by celestial body (planet, moon, asteroid, comet):
# By name (Peppi finds the LID for you)
.has_target("Mars")
# Or by LID if you know it
.has_target("urn:nasa:pds:context:target:planet.mars")
By Time
Filter by when data was collected:
from datetime import datetime
date1 = datetime(2020, 1, 1)
date2 = datetime(2020, 12, 31)
# Data collected before a date
.before(date1)
# Data collected after a date
.after(date1)
# Combine for a range
.after(date1).before(date2)
By Mission/Spacecraft
Filter by instrument host (spacecraft or rover):
# By LID
.has_instrument_host("urn:nasa:pds:context:instrument_host:spacecraft.msl")
# Or find it with Context
context = pep.Context()
curiosity = context.INSTRUMENT_HOSTS.search("curiosity")[0]
products.has_instrument_host(curiosity.lid)
By Instrument
Filter by the specific instrument that collected the data:
.has_instrument("urn:nasa:pds:context:instrument:instrument_lid")
By Investigation
Filter by mission or investigation:
.has_investigation("urn:nasa:pds:context:investigation:mission.msl")
By Collection
Get products from a specific collection:
.of_collection("urn:nasa:pds:mission.instrument:collection::1.0")
By Processing Level
Filter by how processed the data is:
# Available levels: "telemetry", "raw", "partially-processed", "calibrated", "derived"
.has_processing_level("calibrated")
Processing levels explained:
telemetry: Raw transmission from spacecraft
raw: Unprocessed instrument data
partially-processed: Some processing applied
calibrated: Converted to physical units with corrections applied
derived: Higher-level products created from processed data
By Product Type
Filter by the class of product:
.observationals() # Actual science data
.collections() # Collection products
.bundles() # Bundle products
.contexts() # Context products (targets, instruments, etc.)
# For collections, you can specify the type
.collections(collection_type="data")
Custom Filters
For advanced use cases, you can write custom query clauses using the PDS API syntax:
# Filter by any PDS4 property
.filter('pds:Identification_Area.pds:title like "*Mars*"')
See the PDS API documentation for the query syntax.
Working with Results
Iterating Over Products
The simplest way to work with results:
products = pep.Products(client).has_target("Mars").observationals()
for product in products:
print(product.id)
print(product.properties) # Dictionary of all metadata
break # Remove this to see all results
Limiting Results
To avoid iterating over thousands of products while testing:
for i, product in enumerate(products):
print(product.id)
if i >= 9: # Stop after 10 products
break
Converting to DataFrame
For data analysis, convert results to a pandas DataFrame:
import pds.peppi as pep
products = pep.Products(client).has_target("Mars").observationals()
# Convert to DataFrame (automatically handles all pages)
df = products.as_dataframe()
print(df.head())
print(df.columns)
# Limit rows for testing
df = products.as_dataframe(max_rows=100)
Accessing Metadata
Each product has a properties dictionary containing all PDS4 metadata:
for product in products:
# Access specific properties
title = product.properties.get('pds:Identification_Area.pds:title', ['N/A'])[0]
start_time = product.properties.get('pds:Time_Coordinates.pds:start_date_time', ['N/A'])[0]
print(f"Title: {title}")
print(f"Start Time: {start_time}")
break
Note
Most properties are returned as lists (even single values) for consistency. Use [0] to get the first value.
Reducing Returned Fields
For better performance, especially with large result sets, you can limit which fields are returned:
products = pep.Products(client) \
.has_target("Mars") \
.observationals() \
.fields(['lid', 'title', 'pds:Time_Coordinates.pds:start_date_time'])
# Now products will only include the specified fields
Resetting a Query
If you’re iterating and want to start over or build a new query:
products = pep.Products(client).has_target("Mars").observationals()
# Iterate through some results
for i, product in enumerate(products):
if i >= 10:
break
# Reset to use the same query again
products.reset()
# Or build a new query
products = pep.Products(client).has_target("Jupiter").observationals()
Combining Filters
You can combine multiple filters to narrow your search:
from datetime import datetime
import pds.peppi as pep
client = pep.PDSRegistryClient()
context = pep.Context()
# Find Curiosity rover
curiosity = context.INSTRUMENT_HOSTS.search("curiosity")[0]
# Complex query: Mars data from Curiosity, in 2020, calibrated
products = pep.Products(client) \
.has_target("Mars") \
.has_instrument_host(curiosity.lid) \
.after(datetime(2020, 1, 1)) \
.before(datetime(2020, 12, 31)) \
.has_processing_level("calibrated") \
.observationals()
df = products.as_dataframe(max_rows=10)
print(df)
Understanding Data Organization
PDS data is organized hierarchically:
Bundle (e.g., entire mission)
└── Collection (e.g., one instrument's data)
└── Observational Products (e.g., individual images)
When you search for observational products, you’re searching at the most detailed level.
To understand what collection or bundle a product belongs to, look at its properties:
product.properties.get('ops:Provenance.ops:parent_collection_identifier')
product.properties.get('ops:Provenance.ops:parent_bundle_identifier')
Tips and Best Practices
Start Broad, Then Narrow
When exploring new data:
Start with a broad search to see what’s available
Look at the results to understand the data structure
Add more specific filters based on what you learned
# Start broad
products = pep.Products(client).has_target("Mars").observationals()
# Look at first few results
for i, p in enumerate(products):
print(p.properties.keys()) # See what metadata is available
if i >= 2:
break
# Now refine based on what you learned
products.reset()
products = pep.Products(client) \
.has_target("Mars") \
.has_processing_level("calibrated") \
.after(datetime(2020, 1, 1)) \
.observationals()
Use Context for Discovery
Don’t know the exact name of a target or spacecraft? Use Context:
context = pep.Context()
# See all available targets (returns top 10 matches)
mars_related = context.TARGETS.search("mars")
for target in mars_related:
print(f"{target.name}: {target.lid}")
# Find spacecraft (handles typos!)
spacecraft = context.INSTRUMENT_HOSTS.search("curiousity") # Typo on purpose!
print(spacecraft[0].name) # Still finds "Mars Science Laboratory"
Test with Small Result Sets
Always test with limited results before processing large datasets:
# Use max_rows when creating DataFrames
df = products.as_dataframe(max_rows=10)
# Or break out of loops early
for i, product in enumerate(products):
# Your processing here
if i >= 9:
break
Common PDS Metadata Fields
Here are some commonly used metadata fields:
# Identification
'lid' # Product identifier
'title' # Human-readable title
'pds:Identification_Area.pds:title' # Full title
'pds:Identification_Area.pds:logical_identifier' # LID
# Time
'pds:Time_Coordinates.pds:start_date_time' # When observation started
'pds:Time_Coordinates.pds:stop_date_time' # When observation ended
# References
'ref_lid_target' # Target (planet, moon, etc.)
'ref_lid_instrument' # Instrument used
'ref_lid_instrument_host' # Spacecraft/rover
'ref_lid_investigation' # Mission/investigation
# Processing
'pds:Primary_Result_Summary.pds:processing_level' # Processing level
# Provenance
'ops:Provenance.ops:parent_collection_identifier' # Parent collection
'ops:Provenance.ops:parent_bundle_identifier' # Parent bundle
# Citation
'pds:Citation_Information.pds:doi' # DOI for citing
Next Steps
Now that you understand the key concepts, check out:
Cookbook - Ready-to-use recipes for common tasks
Library Reference - Complete API documentation
PDS Search API Documentation - Understanding the underlying API