python-awips/docs/source/dev.rst
2018-09-05 15:52:38 -06:00

654 lines
25 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Development Guide
=================
The Data Access Framework allows developers to retrieve different types
of data without having dependencies on those types of data. It provides
a single, unified data type that can be customized by individual
implementing plug-ins to provide full functionality pertinent to each
data type.
Writing a New Factory
---------------------
Factories will most often be written in a dataplugin, but should always
be written in a common plug-in. This will allow for clean dependencies
from both CAVE and EDEX.
A new plug-ins data access class must implement IDataFactory. For ease
of use, abstract classes have been created to combine similar methods.
Data factories do not have to implement both types of data (grid and
geometry). They can if they choose, but if they choose not to, they
should do the following:
::
throw new UnsupportedOutputTypeException(request.getDatatype(), "grid");
This lets the code know that grid type is not supported for this data
factory. Depending on where the data is coming from, helpers have been
written to make writing a new data type factory easier. For example,
PluginDataObjects can use AbstractDataPluginFactory as a start and not
have to create everything from scratch.
Each data type is allowed to implement retrieval in any manner that is
felt necessary. The power of the framework means that the code
retrieving data does not have to know anything of the underlying
retrieval methods, only that it is getting data in a certain manner. To
see some examples of ways to retrieve data, reference
**SatelliteGridFactory** and **RadarGridFactory**.
Methods required for implementation:
**public DataTime[] getAvailableTimes(IDataRequest request)**
- This method returns an array of DataTime objects corresponding to
what times are available for the data being retrieved, based on the
parameters and identifiers being passed in.
**public DataTime[] getAvailableTimes(IDataRequest request, BinOffset
binOffset)**
- This method returns available times as above, only with a bin offset
applied.
Note: Both of the preceding methods can throw TimeAgnosticDataException
exceptions if times do not apply to the data type.
**public IGridData[] getGridData(IDataRequest request,
DataTime...times)**
- This method returns IGridData objects (an array) based on the request
and times to request for. There can be multiple times or a single
time.
**public IGridData[] getGridData(IDataRequest request, TimeRange
range)**
- Similar to the preceding method, this returns IGridData objects based
on a range of times.
**public IGeometryData[] getGeometryData(IDataRequest request, DataTime
times)**
- This method returns IGeometryData objects based on a request and
times.
**public IGeometryData[] getGeometryData(IDataRequest request, TimeRange
range)**
- Like the preceding method, this method returns IGeometryData objects
based on a range of times.
**public String[] getAvailableLocationNames(IDataRequest request)**
- This method returns location names that match the request. If this
does not apply to the data type, an IncompatibleRequestException
should be thrown.
Registering the Factory with the Framework
------------------------------------------
The following needs to be added in a spring file in the plug-in that
contains the new factory:
::
<bean id="radarGridFactory"
class="com.raytheon.uf.common.dataplugin.radar.dataaccess.RadarGridFactory" />
<bean factory-bean="dataAccessRegistry" factorymethod="register">
<constructor-arg value="radar"/>
<constructor-arg ref="radarGridFactory"/>
</bean>
This takes the RadarGridFactory and registers it with the registry and
allows it to be used any time the code makes a request for the data type
“radar.”
Retrieving Data Using the Factory
---------------------------------
For ease of use and more diverse use, there are multiple interfaces into
the Data Access Layer. Currently, there is a Python implementation and a
Java implementation, which have very similar method calls and work in a
similar manner. Plug-ins that want to use the data access framework to
retrieve data should include **com.raytheon.uf.common.dataaccess** as a
Required Bundle in their MANIFEST.MF.
To retrieve data using the Python interface :
::
from awips.dataaccess import DataAccessLayer
req = DataAccessLayer.newDataRequest()
req.setDatatype("grid")
req.setParameters("T")
req.setLevels("2FHAG")
req.addIdentifier("info.datasetId", "GFS40")
times = DataAccessLayer.getAvailableTimes(req)
data = DataAccessLayer.getGridData(req, times)
To retrieve data using the Java interface :
::
IDataRequest req = DataAccessLayer.newDataRequest();
req.setDatatype("grid");
req.setParameters("T");
req.setLevels("2FHAG");
req.addIdentifier("info.datasetId", "GFS40");
DataTime[] times = DataAccessLayer.getAvailableTimes(req)
IData data = DataAccessLayer.getGridData(req, times);
**newDataRequest()**
- This creates a new data request. Most often this is a
DefaultDataRequest, but saves for future implentations as well.
**setDatatype(String)**
- This is the data type being retrieved. This can be found as the value
that is registered when creating the new factory (See section above
**Registering the Factory with the Framework** [radar in that case]).
**setParameters(String...)**
- This can differ depending on data type. It is most often used as a
main difference between products.
**setLevels(String...)**
- This is often used to identify the same products on different
mathematical angles, heights, levels, etc.
**addIdentifier(String, String)**
- This differs based on data type, but is often used for more
fine-tuned querying.
Both methods return a similar set of data and can be manipulated by
their respective languages. See DataAccessLayer.py and
DataAccessLayer.java for more methods that can be called to retrieve
data and different parts of the data. Because each data type has
different parameters, levels, and identifiers, it is best to see the
actual data type for the available options. If it is undocumented, then
the best way to identify what parameters are to be used is to reference
the code.
Development Background
----------------------
In support of Hazard Services Raytheon Technical Services is building a
generic data access framework that can be called via JAVA or Python. The
data access framework code can be found within the AWIPS Baseline in
::
com.raytheon.uf.common.dataaccess
As of 2016, plugins have been written for grid, radar, satellite, Hydro
(SHEF), point data (METAR, SYNOP, Profiler, ACARS, AIREP, PIREP), maps
data, and other data types. The Factories for each can be found in the
following packages (you may need to look at the development baseline to
see these):
::
com.raytheon.uf.common.dataplugin.grid.dataaccess
com.raytheon.uf.common.dataplugin.radar.dataaccess
com.raytheon.uf.common.dataplugin.satellite.dataaccess
com.raytheon.uf.common.dataplugin.binlightning.dataaccess
com.raytheon.uf.common.dataplugin.sfc.dataaccess
com.raytheon.uf.common.dataplugin.sfcobs.dataaccess
com.raytheon.uf.common.dataplugin.acars.dataaccess
com.raytheon.uf.common.dataplugin.ffmp.dataaccess
com.raytheon.uf.common.dataplugin.bufrua.dataaccess
com.raytheon.uf.common.dataplugin.profiler.dataaccess
com.raytheon.uf.common.dataplugin.moddelsounding.dataaccess
com.raytheon.uf.common.dataplugin.ldadmesonet.dataaccess
com.raytheon.uf.common.dataplugin.binlightning.dataaccess
com.raytheon.uf.common.dataplugin.gfe.dataaccess
com.raytheon.uf.common.hydro.dataaccess
com.raytheon.uf.common.pointdata.dataaccess
com.raytheon.uf.common.dataplugin.maps.dataaccess
Additional data types may be added in the future. To determine what
datatypes are supported display the "type hierarchy" associated with the
classes
**AbstractGridDataPluginFactory**,
**AbstractGeometryDatabaseFactory**, and
**AbstractGeometryTimeAgnosticDatabaseFactory**.
The following content was taken from the design review document which is
attached and modified slightly.
Design/Implementation
---------------------
The Data Access Framework is designed to provide a consistent interface
for requesting and using geospatial data within CAVE or EDEX. Examples
of geospatial data are grids, satellite, radar, metars, maps, river gage
heights, FFMP basin data, airmets, etc. To allow for convenient use of
geospatial data, the framework will support two types of requests: grids
and geometries (points, polygons, etc). The framework will also hide
implementation details of specific data types from users, making it
easier to use data without worrying about how the data objects are
structured or retrieved.
A suggested mapping of some current data types to one of the two
supported data requests is listed below. This list is not definitive and
can be expanded. If a developer can dream up an interpretation of the
data in the other supported request type, that support can be added.
Grids
- Grib
- Satellite
- Radar
- GFE
Geometries
- Map (states, counties, zones, etc)
- Hydro DB (IHFS)
- Obs (metar)
- FFMP
- Hazard
- Warning
- CCFP
- Airmet
The framework is designed around the concept of each data type plugin
contributing the necessary code for the framework to support its data.
For example, the satellite plugin provides a factory class for
interacting with the framework and registers itself as being compatible
with the Data Access Framework. This concept is similar to how EDEX in
AWIPS expects a plugin developer to provide a decoder class and
record class and register them, but then automatically manages the rest
of the ingest process including routing, storing, and alerting on new
data. This style of plugin architecture effectively enables the
framework to expand its capabilities to more data types without having
to alter the framework code itself. This will enable software developers
to incrementally add support for more data types as time allows, and
allow the framework to expand to new data types as they become
available.
The Data Access Framework will not break any existing functionality or
APIs, and there are no plans to retrofit existing cosde to use the new
API at this time. Ideally code will be retrofitted in the future to
improve ease of maintainability. The plugin pecific code that hooks into
the framework will make use of existing APIs such as **IDataStore** and
**IServerRequest** to complete the requests.
The Data Access Framework can be understood as three parts:
- How users of the framework retrieve and use the data
- How plugin developers contribute support for new data types
- How the framework works when it receives a request
How users of the framework retrieve and use the data
----------------------------------------------------
When a user of the framework wishes to request data, they must
instantiate a request object and set some of the values on that request.
Two request interfaces will be supported, for detailed methods see
section "Detailed Code" below.
**IDataRequest**
**IGridRequest** extends **IDataRequest**
**IGeometryRequest** extends **IDataRequest**
For the request interfaces, default implementations of
**DefaultGridRequest** and **DefaultGeometryRequest** will be provided
to handle most cases. However, the use of interfaces allows for custom
special cases in the future. If necessary, the developer of a plugin can
write their own custom request implementation to handle a special case.
After the request object has been prepared, the user will pass it to the
Data Access Layer to receive a data object in return. See the "Detailed
Code" section below for detailed methods of the Data Access Layer. The
Data Access Layer will return one of two data interfaces.
**IData**
**IGridData** extends **IData**
**IGeometryData** extends **IData**
For the data interfaces, the use of interfaces effectively hides the
implementation details of specific data types from the user of the
framework. For example, the user receives an **IGridData** and knows the
data time, grid geometry, parameter, and level, but does not know that
the data is actually a **GFEGridData** vs **D2DGridData** vs
**SatelliteGridData**. This enables users of the framework to write
generic code that can support multiple data types.
For python users of the framework, the interfaces will be very similar
with a few key distinctions. Geometries will be represented by python
geometries from the open source Shapely project. For grids, the python
**IGridData** will have a method for requesting the raw data as a numpy
array, and the Data Access Layer will have methods for requesting the
latitude coordinates and the longitude coordinates of grids as numpy
arrays. The python requests and data objects will be pure python and not
JEP PyJObjects that wrap Java objects. A future goal of the Data Access
Framework is to provide support to python local apps and therefore
enable requests of data outside of CAVE and EDEX to go through the same
familiar interfaces. This goal is out of scope for this project but by
making the request and returned data objects pure python it will not be
a huge undertaking to add this support in the future.
How plugin developers contribute support for new datatypes
----------------------------------------------------------
When a developer wishes to add support for another data type to the
framework, they must implement one or both of the factory interfaces
within a common plugin. Two factory interfaces will be supported, for
detailed methods see below.
**IDataFactory**
**IGridFactory** extends **IDataFactory**
**IGeometryFactory** extends **IDataFactory**
For some data types, it may be desired to add support for both types of
requests. For example, the developer of grid data may want to provide
support for both grid requests and geometry requests. In this case the
developer would write two separate classes where one implements
**IGridFactory** and the other implements **IGeometryFactory**.
Furthermore, factories could be stacked on top of one another by having
factory implementations call into the Data Access Layer.
For example, a custom factory keyed to "derived" could be written for
derived parameters, and the implementation of that factory may then call
into the Data Access Layer to retrieve “grid” data. In this example the
raw data would be retrieved through the **GridDataFactory** while the
derived factory then applies the calculations before returning the data.
Implementations do not need to support all methods on the interfaces or
all values on the request objects. For example, a developer writing the
**MapGeometryFactory** does not need to support **getAvailableTimes()**
because map data such as US counties is time agnostic. In this case the
method should throw **UnsupportedOperationException** and the javadoc
will indicate this.
Another example would be the developer writing **ObsGeometryFactory**
can ignore the Level field of the **IDataRequest** as there are not
different levels of metar data, it is all at the surface. It is up to
the factory writer to determine which methods and fields to support and
which to ignore, but the factory writer should always code the factory
with the user requesting data in mind. If a user of the framework could
reasonably expect certain behavior from the framework based on the
request, the factory writer should implement support for that behavior.
Abstract factories will be provided and can be extended to reduce the
amount of code a factory developer has to write to complete some common
actions that will be used by multiple factories. The factory should be
capable of working within either CAVE or EDEX, therefore all of its
server specific actions (e.g. database queries) should go through the
Request/Handler API by using **IServerRequests**. CAVE can then send the
**IServerRequests** to EDEX with **ThriftClient** while EDEX can use the
**ServerRequestRouter** to process the **IServerRequests**, making the
code compatible regardless of which JVM it is running inside.
Once the factory code is written, it must be registered with the
framework as an available factory. This will be done through spring xml
in a common plugin, with the xml file inside the res/spring folder of
the plugin. Registering the factory will identify the datatype name that
must match what users would use as the datatype on the **IDataRequest**,
e.g. the word "satellite". Registering the factory also indicates to the
framework what request types are supported, i.e. grid vs geometry or
both.
An example of the spring xml for a satellite factory is provided below:
::
<bean id="satelliteFactory"
class="com.raytheon.uf.common.dataplugin.satellite.SatelliteFactory" />
<bean id="satelliteFactoryRegistered" factory-bean="dataFactoryRegistry" factory-method="register">
<constructor-arg value="satellite" />
<constructor-arg value="com.raytheon.uf.common.dataaccess.grid.IGridRequest" />
<constructor-arg value="satelliteFactory" />
</bean>
How the framework works when it receives a request
--------------------------------------------------
**IDataRequest** requires a datatype to be set on every request. The
framework will have a registry of existing factories for each data type
(grid and geometry). When the Data Access Layer methods are called, it
will first lookup in the registry for the factory that corresponds to
the datatype on the **IDataRequest**. If no corresponding factory is
found, it will throw an exception with a useful error message that
indicates there is no current support for that datatype request. If a
factory is found, it will delegate the processing of the request to the
factory. The factory will receive the request and process it, returning
the result back to the Data Access Layer which then returns it to the
caller.
By going through the Data Access Layer, the user is able to retrieve the
data and use it without understanding which factory was used, how the
factory retrieved the data, or what implementation of data was returned.
This effectively frees the framework and users of the framework from any
dependencies on any particular data types. Since these dependencies are
avoided, the specific **IDataFactory** and **IData** implementations can
be altered in the future if necessary and the code making use of the
framework will not need to be changed as long as the interfaces continue
to be met.
Essentially, the Data Access Framework is a service that provides data
in a consistent way, with the service capabilities being expanded by
plugin developers who write support for more data types. Note that the
framework itself is useless without plugins contributing and registering
**IDataFactories**. Once the framework is coded, developers will need to
be tasked to add the factories necessary to support the needed data
types.
Request interfaces
------------------
Requests and returned data interfaces will exist in both Java and
Python. The Java interfaces are listed below and the Python interfaces
will match the Java interfaces except where noted. Factories will only
be written in Java.
**IDataRequest**
- **void setDatatype(String datatype)** - the datatype name and
also the key to which factory will be used. Frequently pluginName
such as radar, satellite, gfe, ffmp, etc
- **void addIdentifier(String key, Object value)** - an identifier the
factory can use to determine which data to return, e.g. for grib data
key "modelName" and value “GFS40”
- **void setParameters(String... params)**
- **void setLevels(Level... levels)**
- **String getDatatype()**
- **Map getIdentifiers()**
- **String[] getParameters()**
- **Level[] getLevels()**
- Python Differences
- **Levels** will be represented as **Strings**
**IGridRequest extends IDataRequest**
- **void setStorageRequest(Request request)** - a datastorage request
that allows for slab, line, and point requests for faster performance
and less data retrieval
- **Request getStorageRequest()**
- Python Differences
- No support for storage requests
**IGeometryRequest extends IDataRequest**
- **void setEnvelope(Envelope env)** - a bounding box envelope to limit
the data that is searched through and returned. Not all factories may
support this.
- **setLocationNames(String... locationNames)** - a convenience of
requesting data by names such as ICAOs, airports, stationIDs, etc
- **Envelope getEnvelope()**
- **String[] getLocationNames()**
- Python Differences
- Envelope methods will use a **shapely.geometry.Polygon** instead of
**Envelopes** (shapely has no concept of envelopes and considers them
as rectangular polygons)
Data Interfaces
~~~~~~~~~~~~~~~
**IData**
- **Object getAttribute(String key)** - **getAttribute** provides a way
to get at attributes of the data that the interface does not provide,
allowing the user to get more info about the data without adding
dependencies on the specific data type plugin
- **DataTime getDataTime()** - some data may return null (e.g. maps)
- **Level getLevel()** - some data may return null
- Python Differences
- **Levels** will be represented by **Strings**
**IGridData extends IData**
- **String getParameter()**
- **GridGeometry2D getGridGeometry()**
- **Unit getUnit()** - some data may return null
- **DataDestination populateData(DataDestination destination)** - How
the user gets the raw data by passing in a **DataDestination** such
as **FloatArrayWrapper** or **ByteBufferWrapper**. This allows the
user to specify the way the raw data of the grid should be structured
in memory.
- **DataDestination populateData(DataDestination destination, Unit
unit)** - Same as the above method but also attempts to convert the
raw data to the specified unit when populating the
**DataDestination**.
- Python Differences
- **Units** will be represented by **Strings**
- **populateData()** methods will not exist, instead there will be
a **getRawData()** method that returns a numpy array in the native
type of the data
**IGeometryData extends IData**
- **Geometry getGeometry()**
- **Set getParameters()** - Gets the list of parameters included in
this data
- **String getString(String param)** - Gets the value of the parameter
as a String
- **Number getNumber(String param)** - Gets the value of the parameter
as a Number
- **Unit getUnit(String param)** - Gets the unit of the parameter,
may be null
- **Type getType(String param)** - Returns an enum of the raw type of
the parameter, such as Float, Int, or String
- **String getLocationName()** - Returns the location name of the piece
of data, typically to correlate if the request was made with
locationNames. May be null.
- Python Differences
- **Geometry** will be **shapely.geometry.Geometry**
- **getNumber()** will return the python native number of the data
- **Units** will be represented by **Strings**
- **getType()** will return the python type object
**DataAccessLayer** (in implementation, these methods delegate
processing to factories)
- **DataTime[] getAvailableTimes(IDataRequest request)**
- **DataTime[] getAvailableTimes(IDataRequest request, BinOffset
binOffset)**
- **IData[] getData(IDataRequest request, DataTime... times)**
- **IData[] getData(IDataRequest request, TimeRange timeRange)**
- **GridGeometry2D getGridGeometry(IGridRequest request)**
- **String[] getAvailableLocationNames(IGeometryRequest request)**
- Python Differences
- No support for **BinOffset**
- **getGridGeometry(IGridRequest)** will be replaced by
**getLatCoords(IGridRequest)** and **getLonCoords(IGridRequest)**
that will return numpy arrays of the lat or lon of every grid
cell
Factory Interfaces (Java only)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- **IDataFactory**
- **DataTime[] getAvailableTimes(R request)** - queries the
database and returns the times that match the request. Some factories
may not support this (e.g. maps).
- **DataTime[] getAvailableTimes(R request, BinOffset binOffset)** -
queries the database with a bin offset and returns the times that
match the request. Some factories may not support this.
- **D[] getData(R request, DataTime... times)** - Gets the data that
matches the request at the specified times.
- **D[] getData(R request, TimeRange timeRange)** - Gets the data that
matches the request and is within the time range.
**IGridDataFactory extends IDataFactory**
- **GridGeometry2D** **getGeometry(IGridRequest request)** - Returns
the grid geometry of the data that matches the request BEFORE making
the request. Useful for then making slab or line requests for subsets
of the data. Does not support moving grids, but moving grids dont
make subset requests either.
**IGeometryDataFactory extends IDataFactory**
- **getAvailableLocationNames(IGeometryRequest request)** - Convenience
method to retrieve available location names that match a request. Not
all factories may support this.