Title: | Crawler for Navigating THREDDS Catalogs |
---|---|
Description: | Provides a crawler for programmatically navigating THREDDS Data Server (<https://www.unidata.ucar.edu/software/tds/>) catalogs, and access dataset metadata and resources. |
Authors: | Ben Tupper [aut], Emmanuel Blondel [aut, cre] , Bigelow Laboratory for Ocean Sciences [cph] |
Maintainer: | Emmanuel Blondel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1-4 |
Built: | 2024-11-12 05:41:47 UTC |
Source: | https://github.com/bigelowlab/thredds |
Build and xpath string, possibly using the user specified namespace prefix.
build_xpath(x, prefix = "d1", select = ".//")
build_xpath(x, prefix = "d1", select = ".//")
x |
character one or more path segments |
prefix |
character by default "d1" prepended to each of the segements
in |
select |
charcater, by default search anywhere in the current node with ".//" |
xpath descriptor
A catalog representation that sublcasses from ThreddsNode
thredds::ThreddsNode
-> CatalogNode
list_services()
list available services
CatalogNode$list_services( xpath = build_xpath("service", prefix = self$prefix), form = "list" )
xpath
character, the xpath specifications
form
character, either "list" or "table"
list of zero or more character vectors
list_catalogs()
list available catalogRefs
CatalogNode$list_catalogs( xpath = build_xpath(c("dataset", "catalogRef"), prefix = self$prefix), form = "list" )
xpath
character, the xpath descriptor
form
character, either "list" or "table"
a list with zero or more character vectors
list_datasets()
list available datasets
CatalogNode$list_datasets( xpath = build_xpath(c("dataset", "dataset"), prefix = self$prefix), form = "list" )
xpath
character, the xpath descriptor
form
character, either "list" or "table"
a list with zero or more character vectors
get_catalogs()
Retrieve a list one or more of child catalogs
CatalogNode$get_catalogs( index, xpath = build_xpath(c("dataset", "catalogRef"), prefix = self$prefix) )
index
integer index (1,...,nChild), indices or name(s)
xpath
character xpath representation
a list of Catalog class objects, possibly NULL
get_datasets()
Retrieve list one or more dataset children
CatalogNode$get_datasets( index, xpath = build_xpath(c("dataset", "dataset"), prefix = self$prefix) )
index
the integer index (1,...,nChild), indices or name(s)
xpath
character xpath representation
a list of Dataset objects or NULL
get_dataset_names()
Retrieve list zero or more dataset child names. If unnnamed, then we substitute "title", "ID", "urlPath", or "href" in that order of availability.
CatalogNode$get_dataset_names( xpath = build_xpath(c("dataset", "dataset"), prefix = self$prefix) )
xpath
character xpath representation
index
the integer index (1,...,nChild), indices or name(s)
character vector of zero or more names
get_catalog_names()
Retrieve list zero or more catalog child names. If unnnamed, then we substitute "title", "ID", "urlPath" or href" in that order of availability.
CatalogNode$get_catalog_names( xpath = build_xpath(c("dataset", "catalogRef"), prefix = self$prefix) )
xpath
character xpath representation
index
the integer index (1,...,nChild), indices or name(s)
character vector of zero or more names
parse_catalog_node()
Parse a catalog node
CatalogNode$parse_catalog_node(x)
x
xml_node
Catalog class object
parse_dataset_node()
Parse a dataset node
CatalogNode$parse_dataset_node(x)
x
xml_node
Dataset class object
print()
print method
CatalogNode$print(prefix = "")
prefix
character, to be printed before each line of output (like spaces)
...
other arguments for superclass
clone()
The objects of this class are cloneable with this method.
CatalogNode$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(thredds) top_uri <- 'https://oceandata.sci.gsfc.nasa.gov/opendap/catalog.xml' Top <- thredds::CatalogNode$new(top_uri) #to browse catalogue #Top$browse() #go down in 'MODISA' catalog L3 <- Top$get_catalogs("MODISA")[["MODISA"]]$get_catalogs()[[1]] #see what's available for 2009 catalog2009 <- L3$get_catalogs("2009")[[1]] #get catalog for 2009-01-20 doy <- format(as.Date("2009-01-20"), "%m%d") catalog20 <- catalog2009$get_catalogs(doy)[[doy]] #get dataset node chl <- catalog20$get_datasets("AQUA_MODIS.20090120.L3m.DAY.CHL.chlor_a.4km.nc")[[1]] #retrieve the relative URL, and add it to the base URL for the service. #Somewhat awkwardly, the relative URL comes prepended with a path separator, so we #use straight up `paste0` to append to the base_uri. #if(require("ncdf4")){ # base_uri <- "https://oceandata.sci.gsfc.nasa.gov:443/opendap" # uri <- paste0(base_uri, chl[["AQUA_MODIS.20090120.L3m.DAY.CHL.chlor_a.4km.nc"]]$url) # NC <- ncdf4::nc_open(uri) #}
library(thredds) top_uri <- 'https://oceandata.sci.gsfc.nasa.gov/opendap/catalog.xml' Top <- thredds::CatalogNode$new(top_uri) #to browse catalogue #Top$browse() #go down in 'MODISA' catalog L3 <- Top$get_catalogs("MODISA")[["MODISA"]]$get_catalogs()[[1]] #see what's available for 2009 catalog2009 <- L3$get_catalogs("2009")[[1]] #get catalog for 2009-01-20 doy <- format(as.Date("2009-01-20"), "%m%d") catalog20 <- catalog2009$get_catalogs(doy)[[doy]] #get dataset node chl <- catalog20$get_datasets("AQUA_MODIS.20090120.L3m.DAY.CHL.chlor_a.4km.nc")[[1]] #retrieve the relative URL, and add it to the base URL for the service. #Somewhat awkwardly, the relative URL comes prepended with a path separator, so we #use straight up `paste0` to append to the base_uri. #if(require("ncdf4")){ # base_uri <- "https://oceandata.sci.gsfc.nasa.gov:443/opendap" # uri <- paste0(base_uri, chl[["AQUA_MODIS.20090120.L3m.DAY.CHL.chlor_a.4km.nc"]]$url) # NC <- ncdf4::nc_open(uri) #}
A direct Dataset representation that subclasses from ThreddsNode
thredds::ThreddsNode
-> DatasetNode
name
character, often the filename
dataSize
numeric, size in bytes
date
character, modification date
new()
initialize an instance of ServiceNode
DatasetNode$new(x, ...)
x
url or xml2::xml_node
...
arguments for superclass initialization
GET()
Overrides the GET method of the superclass. GET is not permitted
DatasetNode$GET()
NULL
get_url()
Retrieve the relative URL for a dataset.
DatasetNode$get_url( service = c("dap", "opendap", "wms")[1], sep = c("/", "")[2], ... )
service
character, the service to use. (default 'dap' equivalent to 'opendap') Ignored if ‘urlPath' or 'href' is in the nodes’ attributes.
sep
character, typically "/" or "" (default), used for joined base_url to relative url
...
other arguments for DatasetNode$list_access
character
list_access()
list access methods
DatasetNode$list_access(xpath = build_xpath("access", prefix = self$prefix))
xpath
charcater, xpath descriptor
named list of character vectors or NULL
print()
print method
DatasetNode$print(prefix = "")
prefix
character, to be printed before each line of output (like spaces)
...
other arguments for superclass
clone()
The objects of this class are cloneable with this method.
DatasetNode$clone(deep = FALSE)
deep
Whether to make a deep clone.
For examples see CatalogNode
Retrieve a catalog
get_catalog(uri, ...)
get_catalog(uri, ...)
uri |
the URI of the catalog |
... |
further arguments for parse_node |
ThreddsNodeRefClass or subclass or NULL
Retrieve the namespaces for a resource
get_xml_ns(uri)
get_xml_ns(uri)
uri |
the URI of the catalog |
the output of xml_ns
Determine if a vector of names match the greplargs
grepl_it(x, greplargs = NULL)
grepl_it(x, greplargs = NULL)
x |
a vector of names |
greplargs |
NULL, vector or list |
logical vector
Test if an object inherits from xml2::xml_node
is_xmlNode(x, classname = "xml_node")
is_xmlNode(x, classname = "xml_node")
x |
object to test |
classname |
character, the class name to test against, by default 'xml_node' |
logical
Convert a node to an object inheriting from ThreddsNode
parse_node(node, url = NULL, verbose = FALSE, encoding = "UTF-8", ...)
parse_node(node, url = NULL, verbose = FALSE, encoding = "UTF-8", ...)
node |
xml2::xml_node or an httr::response object |
url |
character, optional url if a catalog or direct dataset |
verbose |
logical, by default FALSE |
encoding |
character, by default UTF-8 |
... |
further arguments for instantiation of classes (such as ns = "foo") |
ThreddsNode class object or subclass
A Service representation that subclasses from ThreddsNode
thredds::ThreddsNode
-> ServiceNode
name
character
serviceType
character
base
character base url
new()
initialize an instance of ServiceNode
ServiceNode$new(x, ...)
x
url or xml2::xml_node
...
arguments for superclass initialization
print()
print method
ServiceNode$print(prefix = "")
prefix
character, to be printed before each line of output (like spaces)
...
other arguments for superclass
clone()
The objects of this class are cloneable with this method.
ServiceNode$clone(deep = FALSE)
deep
Whether to make a deep clone.
For examples see CatalogNode
A limited crawler for programmatically navigating THREDDS catalogs.
R6 base class for all other to inherit from
url
character - possibly wrong but usually right!
node
xml2::xml_node
verbose
logical
prefix
xpath namespace prefix, NA or NULL or charcater() to ignore
tries
numeric number of requests attempts before failing
encoding
character, by default 'UTF-8'
base_url
character, the base URL for the service
new()
initialize an instance of ThreddsNode
ThreddsNode$new( x, verbose = FALSE, n_tries = 3, prefix = NULL, ns_strip = FALSE, encoding = "UTF-8", base_url = "" )
x
url or xml2::xml_node
verbose
logical, TRUE to be noisy (default FALSE)
n_tries
numeric, defaults to 3
prefix
character, the namespace to examine (default NULL, inherited when initialized)
ns_strip
logical, if TRUE then strip namespace (default FALSE)
encoding
character, by default 'UTF-8'
base_url
character, the base URL for the service
print()
print method
ThreddsNode$print(prefix = "", ...)
prefix
character, to be printed before each line of output (like spaces)
...
other argum,ents (ignored for now)
GET()
Retrieve a node of the contents at this nodes URL
ThreddsNode$GET()
ThreddsNode or subclass or NULL
browse()
Browse the URL if possible
ThreddsNode$browse()
children_names()
Retrieve a vector of unique child names
ThreddsNode$children_names(...)
...
further arguments for xml_children_names
a vector of zero or more child names
clone()
The objects of this class are cloneable with this method.
ThreddsNode$clone(deep = FALSE)
deep
Whether to make a deep clone.
Abstract class. For examples see CatalogNode
Get the names of children
xml_children_names(x, unique_only = TRUE)
xml_children_names(x, unique_only = TRUE)
x |
xml2::xml_node |
unique_only |
logical if TRUE remove duplicates |
zero or more child names.
Retrieve an ID value for a node from it's attributes.
xml_id(x, atts = c("name", "title", "ID", "urlPath", "href"))
xml_id(x, atts = c("name", "title", "ID", "urlPath", "href"))
x |
xml node or a named character vector as per |
atts |
character, ordered vector of attribute names to use as an ID value As the list is stepped through if an attribute is missing or empty character then advance to the next, otherwise return that value |
character identifier, possibly an empty character (character()
)
Convert xm2::xml_node to character
xmlString(x)
xmlString(x)
x |
xmlNode |
character