PostGIS is a spatial database. Oracle Spatial and SQL Server 2008 are also spatial databases. But what does that mean; what is it that makes an ordinary database a spatial database?
An ordinary database has strings, numbers, and dates. A spatial database adds one or more additional types for representing geographic features. The basic geographic types are:
This basic model is defined in the Open Geospatial Consortium (OGC) “Simple Features for SQL” (SFSQL) specification. All the particular spatial database implementations we will be discussing will share this model.
An ordinary database provides “access methods” – commonly known as indexes – to allow fast and random access to subsets of data. Indexing for standard types (numbers, strings, dates) is usually done with B-tree indexes. A B-tree partitions the data using the natural sort order to put the data into a hierarchical tree.
The natural sort order of numbers, strings, and dates is simple to determine – every value is less than, greater than or equal to every other value. But because polygons can overlap, can be contained in one another, and are arrayed in a two-dimensional (or more) space, a B-tree cannot be used to efficiently index them. Real spatial databases provide a “spatial index” that instead answers the question “which objects are within this particular bounding box?”.
A bounding box is the smallest size rectangle capable of containing a given feature.
Bounding boxes are used because answering the question “is A inside B?” is very computationally intensive for polygons but very fast in the case of rectangles. Even the most complex polygons and linestrings can be represented by a simple bounding box.
Indexes have to perform quickly in order to be useful. So instead of providing exact results, as B-trees do, spatial indexes provide approximate results. The question “what lines are inside this polygon?” will be instead interpreted by a spatial index as “what lines have bounding boxes that are contained inside this polygon’s bounding box?”
The actual spatial indexes implemented by various databases vary widely. The most common implementation is the R-tree (used in PostGIS), but there are also implementations of Quadtrees, and grid-based indexes in shipping spatial databases.
An ordinary database will provide functions for manipulating the data during a query: concatenating strings, performing hash operations on strings, doing mathematics on numbers, extracting information from dates, etc. A true spatial database must provide a complete set of functions for analyzing the components of geometries, determining the relationships between geometries, and manipulating geometries.
The functions fall into a few basic categories:
The list of possible functions is very large, but a common set of functions is defined by the OGC SFSQL and implemented (along with additional useful functions) by PostGIS.
PostGIS turns PostgreSQL into a spatial database by adding the three key features discussed above: spatial types, spatial indexes and spatial functions. Because PostGIS is built on PostgreSQL, it automatically inherits important “enterprise” features from PostgreSQL: ACID transaction guarantees, reliability, crash recovery, hot backup, replication, full SQL92 support and more.
The shapefile (and other file formats) have been the standard way of storing and interacting with spatial data since GIS software was first written. However, these “flat” files have the following disadvantages:
Most users of PostGIS are setting up systems where multiple applications will be expected to access the data, so having a standard SQL access method simplifies deployment and development. Some users are working with large data sets; with files, they might be segmented into multiple files, but in a database they can be stored as a single large table.
In summation, the combination of support for multiple users, complex ad hoc queries, and performance on large data sets are what sets spatial databases apart from file-based systems.
The first version of PostGIS was written by Refractions Research in the spring of 2001. The 0.1 version released in May 2001 had objects, indexes and functions, but only a very few functions. The result was a database suitable for storage and retrieval, but no analysis.
As functions were added through 2001, it became clear that an organizing principle was needed, and the organization was found in the “Simple Features for SQL” (SFSQL) specification from the Open Geospatial Consortium. The specification provided a guideline for function naming, and for what functions were required to make the database complete.
By mid-2001, the number of functions was high enough to allow for some simple analysis and spatial joins, and the first external application was available to provide visualization of data in the database – Mapserver. Over the next years, the number of functions grew, but was limited by an important consideration. Many of the most interesting functions (ST_Intersects(), ST_Buffer(), ST_Union()) were very difficult to code in generality – writing them from scratch could be the work of years.
Fortunately a second project, the “Geometry Engine, Open Source” or GEOS, came along at about the right time. GEOS is a library that provides the algorithms necessary to implement the difficult functions in the SFSQL specification. By linking in GEOS, PostGIS was able to provide complete support for SFSQL by version 0.8.
As more and more data was put into PostGIS, another issue arose: the representation used to store the geometry in tables was relatively inefficient. For small objects like points and short lines, the metadata in the representation had as much as a 300% overhead. For performance reasons, it was necessary to put the representation on a diet: the metadata header was shrunk, extra dimensions were made optional, and the overhead greatly reduced. In PostGIS 1.0, this new, faster, lightweight representation became the default.
Recent updates of PostGIS have worked on expanding standards compliance, adding support for curve-based geometries and function signatures specified in the ISO SQL/MM standard. Performance has also been a continuing focus, and PostGIS 1.4 has significantly improved the speed of geometry testing routines (ST_Intersects(), ST_Contains()) and aggregates (ST_Union(), ST_Collect()).
A common question from people familiar with open source databases is, “why wasn’t PostGIS built on MySQL?”. The short answer is, “because that would have a lot harder.” The longer answer is this:
PostgreSQL has a built-in “type extension” mechanism, allowing you to add new types of all kinds relatively easily. It also has a generic index structure (GIST) allowing you to build indexes on more-or-less anything. As such, it’s a very easy development path to add new types, includes spatial types, to PostgreSQL. In the proprietary world, only Illustra (now Informix Universal Server) allows such easy extension, which is no coincidence, since Illustra is a proprietary re-working of the original PostgreSQL code base from the 1980’s.
Because the development path for adding types to PostgreSQL was so straightforward, it made sense to start there. When MySQL released basic spatial types in version 4.1, the PostGIS team took a look at their code, and the exercise reinforced the original decision to use PostgreSQL. The MySQL code was spread over the whole database code base, because the spatial objects had to be hacked on top of the string handling code as a special case. PostGIS 0.1 took under a month. Doing a “MyGIS” 0.1 would have taken a lot longer, and as such, might never have seen the light of day.
For a complete list of case studies, see the PostGIS case studies page.
IGN is the national mapping agency of France, and uses PostGIS to store the high resolution topographic map of the country, “BDUni”. BDUni has more than 100 million features, and is maintained by a staff of over 100 field staff who verify observations and add new mapping to the database daily. The IGN installation uses the database transactional system to ensure consistency during update processes, and a warm standby system to maintain uptime in the event of a system failure.
GlobeXplorer is a web-based service providing online access to petabytes of global satellite and aerial imagery. GlobeXplorer uses PostGIS to manage the metadata associated with the imagery catalogue, so queries for imagery first search the PostGIS catalogue to find the location of the relevant images, then pull the images from storage and return them to the client. In building their system, GlobeXplorer tried other spatial databases but eventually settled on PostGIS because of the great combination of price and performance it offers.
PostGIS has become a widely used spatial database, and the number of third-party programs that support storing and retrieving data using it has increased as well. The programs that support PostGIS include both open source and proprietary software on both server and desktop systems.
The following table shows a list of some of the software that leverages PostGIS:
| Open/Free | Closed/Proprietary |
|---|---|
|
|