We’ve already seen that simply adding an index to a table won’t always improve your query time. One tricky example from the previous workbook is this:
SELECT DISTINCT county.name FROM county, ocean
WHERE ST_Intersects(county.the_geom, ocean.the_geom);
You may remember this being slow, but both the county and ocean tables are fairly small tables, with two and 174 rows respectively. Where spatial data differs from simpler data is that the number of rows is a poor proxy to the complexity of the data. Point data is small and simple and can be compared fairly quickly. Polygonal data can range from a few vertices outlining a rectangle to millions of vertices defining the coastline of a continent.
This is the case with the ocean table, where two polygons define the entire coastline of California. From the picture above you can see that the large polygon has a bounding box that overlaps with all but one of the county polygons. This will result in very poor index selectivity, rendering the index ineffective.
One solution to this problem is to break the oceans polygons into many smaller, simpler polygons. This can be done by
Decompose into linework, associate to original record, remove redundancies, simplify lines, aggregate and polygonize.
Triggers and checks.
Taking an arbitrary (valid) linestring, can you generate a linestring comprised of point of the original taken at equal intervals along it’s length?
generate_sequence(integer, integer, integer)
Ocean and County boundaries do not coincide. County boundaries are higher resolution. Can we adjust the Ocean boundary to match?