OpenGeo

Introduction to PostGIS

Table Of Contents

Previous topic

Section 15: Validity and Equality

Next topic

Section 17: Advanced Functions

Section 16: Advanced Performance

Second Guessing the Query Planner

We discussed Query Plan analysis in a previous section (see Section 13: Query Plan Analysis). A number of configuration parameters are available to allow you to control the paths the query planner will consider. The most useful are the following:

enable_indexscan
Setting this parameter to ‘off’ will disable all index scans. This can be used to determine the effect an existing index has on queries without dropping the index.
enable_seqscan
It is not possible to disable sequential scans entirely, but setting this parameter to ‘off’ will ensure that they are only used when no other approach is possible.

These parameters should not be turned off in the database configuration file (postgresql.conf). Rather they should be changed on a per-session basis to allow you to investigate the performance differences between different query plans.

You can query the current status of these parameters...

SHOW enable_seqscan;
SHOW enable_indexscan;

...and you can set them as well.

SET enable_seqscan TO 'off';
SET enable_indexscan TO 'on';

Using these parameters, along with the query planning parameters [1] such as random_page_cost and seq_page_cost described earlier (see Section 12: Tuning PostgreSQL for Spatial), gives you great flexibility when investigating the query planner and seeing how these cost estimates influence the chosen path.

Take some time to look into some of the previous exercises (see Section 11: Basic Exercises) and see if you can improve your performance and understand why and when different routes are chosen.

Denormalization

A common trick in spatial database optimisation is to denormalize data. This involves splitting a single large table into several smaller tables based on some combination of attributes which could be spatial or aspatial. This is particularly useful for rendering data where different classes of the same feature will be styled differently. Take the example of the jacksonco_streets table. This table contains all the roads for Jackson County, from major Interstates to unsealed back roads. Fortunately, there is an attribute type, available to distinguish the different road classifications. In this case, the back roads have a type value of 6.

SELECT type, count(*) FROM jacksonco_streets GROUP BY type ORDER BY type;
_images/denormalize01.png

Consider the process of styling a map. Major roads should usually be styled using thicker, more visible lines than the smaller, less important roads. Major roads should also be rendered separately so that the those roads always appear on top. They will also be visible at smaller scales than local roads. Forcing the renderer to traverse each and every record in a large dataset to find a small subset is not as efficient as physically separating the data into separate tables, even after applying an index to the filter column.

Try splitting the jacksonco_streets table into two tables containing major roads and minor roads. Here are a few commands available that may be helpful to you:

  • SELECT INTO [3]
  • CREATE LIKE [4]
  • CREATE TABLE AS [5]

Don’t forget about indices, primary keys and the geometry_columns table. Refer to Section 5: Geometries and Section 8: Spatial Indexing as necessary.

Data Partitioning

One issue with data denormalization is that logically similar data is no longer accessible from a single location. Take the example of street tables denormalized by street class; highways, freeways and expressways will be stored in one table; arterial and local roads in another; paths, 4x4 tracks and cycleways in a third. Not try finding a street based on it’s name; or apply an update to the data. Denormalization makes most tasks that need to be performed against the entire dataset more difficult. Data Partitioning takes the denormalization of data one step farther by providing both the normalized and denormalized tables that operate on a single copy of the data.

Partitioning involves splitting one table into several similarly structured child tables, using PostgreSQL table inheritance. Constraints are defined on each child table to ensure that every possible record belongs in one, and only one child table.

Have a look at the partitioning documentation [6], and try your hand at setting up master and child tables to separate local from major roads.

Footnotes

[1]http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html
[2]http://www.census.gov/geo/www/tiger/tgrshp2008/tgrshp2008.html
[3]http://www.postgresql.org/docs/8.4/interactive/sql-selectinto.html
[4]http://www.postgresql.org/docs/8.4/interactive/sql-createtable.html
[5]http://www.postgresql.erg/docs/8.4/interactive/sql-createtableas.html
[6]http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html