Problems with RGeo intersection functions

1.5k views Asked by At

I'm getting strange/incorrect results from RGeo polygon intersection functions (Ruby 2.3.0, RGeo 0.5.3)

EXAMPLE 1:

I have two polygons which I believe share a boundary but do not share any internal space (i.e. they touch but do not overlap):

wkt_1 = "POLYGON ((-8226874.27782158 4962626.76394919, -8223358.174520462 4961756.817075645, -8223358.174520462 4960289.557693501, -8224471.369428394 4960289.557693501, -8226874.27782158 4962253.674727506, -8226874.27782158 4962626.76394919))"
wkt_2 = "POLYGON ((-8224757.546680832 4960523.476563589, -8225269.1002275925 4959296.105368667, -8226993.791361805 4959219.668340384, -8226420.900079966 4961883.087589158, -8224757.546680832 4960523.476563589))"
poly_1 = RGeo::Geos.factory(:srid => 3857).parse_wkt(wkt_1)
poly_2 = RGeo::Geos.factory(:srid => 3857).parse_wkt(wkt_2)

When we check the intersection between them, it returns a line, as expected for geometries sharing only a boundary:

poly_1.intersection poly_2
=> #<RGeo::Geos::CAPILineStringImpl:0x3fc0249af168 "LINESTRING (-8224757.546680832 4960523.476563589, -8225598.074380083 4961210.51680879)">

However, when running the following checks, we get the opposite of what's expected:

poly_1.overlaps? poly_2
=> true
poly_1.touches? poly_2
=> false

EXAMPLE 2:

We take two legitimately overlapping polygons:

wkt_3 = "POLYGON ((-8243237.0 4970203.0, -8243237.0 4968735.0, -8242123.0 4968735.0, -8242123.0 4970203.0, -8243237.0 4970203.0))"
wkt_4 = "POLYGON ((-8244765.0 4966076.0, -8244765.0 4964608.0, -8243652.0 4964608.0, -8243652.0 4966076.0, -8242680.0 4969362.0, -8244765.0 4966076.0))"
poly_3 = RGeo::Geos.factory(:srid => 3857).parse_wkt(wkt_3)
poly_4 = RGeo::Geos.factory(:srid => 3857).parse_wkt(wkt_4)

enter image description here

And calculate the differences in both directions:

diff_a = poly_3.difference poly_4
=> #<RGeo::Geos::CAPIPolygonImpl:0x3fe3fca26028 "POLYGON ((-8243077.837796713 4968735.0, -8243237.0 4968735.0, -8243237.0 4970203.0, -8242123.0 4970203.0, -8242123.0 4968735.0, -8242865.466828971 4968735.0, -8242680.0 4969362.0, -8243077.837796713 4968735.0))">
diff_b = poly_4.difference poly_3
=> #<RGeo::Geos::CAPIPolygonImpl:0x3fe3fd1dda28 "POLYGON ((-8242865.466828971 4968735.0, -8243652.0 4966076.0, -8243652.0 4964608.0, -8244765.0 4964608.0, -8244765.0 4966076.0, -8243077.837796713 4968735.0, -8242865.466828971 4968735.0))">

Now we check the remainder polygons against their subtractors:

diff_b.touches? poly_3
=> true
diff_b.overlaps? poly_3
=> false

This is good.

diff_a.touches? poly_4
=> false
diff_a.overlaps? poly_4
=> true

This is IMPOSSIBLE- There's no way the remainder of a polygon subtracted from another should be able to overlap that polygon!

And why is it only happening in one direction?

EXAMPLE 3:

The weirdness continues. Let's now get the intersection of poly_3 and poly_4

intersection_a = poly_3.intersection poly_4
=> #<RGeo::Geos::CAPIPolygonImpl:0x3fd1084ece88 "POLYGON ((-8242865.724520766 4968734.582337743, -8243078.32501591 4968734.582337743, -8242680.062418439 4969362.301390027, -8242865.724520766 4968734.582337743))">

Now since this is what should have been subtracted from poly_3 to give us diff_a, diff_a should therefore intersect with intersection_a in exactly the same way as that it intersects with poly_4 (the subtractor).

Except it doesn't:

diff_a.touches? poly_4
=> false
diff_a.touches? intersection_a
=> true
diff_a.intersection poly_4
=> #<RGeo::Geos::CAPILineStringImpl:0x3fe3f98fb854 "LINESTRING (-8242680.0 4969362.0, -8243077.837796713 4968735.0)">
diff_a.intersection intersection_a
=> #<RGeo::Geos::CAPIMultiLineStringImpl:0x3fe3fca157b4 "MULTILINESTRING ((-8242865.466828971 4968735.0, -8242680.0 4969362.0), (-8242680.0 4969362.0, -8243077.837796713 4968735.0))">

Worse, neither of those two intersection results make sense. It should be a single, 2-segmented line, which neither of those are.

2

There are 2 answers

1
Eric Duminil On BEST ANSWER

Short answer

Unfortunately, it seems that you cannot expect reliable and correct output from touches? and overlaps? when using Float coordinates.

It doesn't depend on RGeo or GEOS versions (or for that matter, JTS, the project on which GEOS is based).

If you need information about the position of two polygons, you could use Geometry#distance and check that it is smaller than a given epsilon.

Geometry#intersection is a bit more reliable than touches? and overlaps?, but it isn't guaranteed to work with every example either.


Is it a precision problem?

Theory

touches? is by definition very sensitive : polygons have to share a point or a line on their border, but shouldn't have any common interior points.

  • If they're too far away from each other, they don't have any point in common, and touches? is false.
  • If they're too close, their intersection is a polygon, and touches? is false.

Sensitivity analysis

How sensitive is it exactly?

Let's consider two polygons, one just above the other :

enter image description here

POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))
POLYGON((0 1, 0 2, 1 2, 1 1, 0 1))

We could move the lower border of the upper polygon by an epsilon, and see which influence it has :

require 'rgeo'

epsilon = 1E-15

deltas = [-epsilon, 0, epsilon]

deltas.each do |delta|
  puts "--------------------------------"
  puts "Delta : #{delta}\n\n"
  simple1 = 'POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'
  simple2 = "POLYGON((0 #{1+delta}, 0 2, 1 2, 1 #{1+delta}, 0 #{1+delta}))"

  puts "Polygon #1 : #{simple1}\n"
  puts "Polygon #2 : #{simple2}\n\n"

  poly_1 = RGeo::Geos.factory(:srid => 4326).parse_wkt(simple1)
  poly_2 = RGeo::Geos.factory(:srid => 4326).parse_wkt(simple2)

  puts "Intersection : #{poly_1.intersection poly_2}"
  puts

  puts "Distance between polygons :"
  puts format('%.30f°', poly_1.distance(poly_2))
  puts

  puts "Overlaps? : #{poly_1.overlaps? poly_2}"
  puts "Touches? : #{poly_1.touches? poly_2}"
end

It outputs :

--------------------------------
Delta : -1.0e-15

Polygon #1 : POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))
Polygon #2 : POLYGON((0 0.999999999999999, 0 2, 1 2, 1 0.999999999999999, 0 0.999999999999999))

Intersection : POLYGON ((0.0 0.999999999999999, 0.0 1.0, 1.0 1.0, 1.0 0.999999999999999, 0.0 0.999999999999999))

Distance between polygons :
0.000000000000000000000000000000°

Overlaps? : true
Touches? : false
--------------------------------
Delta : 0

Polygon #1 : POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))
Polygon #2 : POLYGON((0 1, 0 2, 1 2, 1 1, 0 1))

Intersection : LINESTRING (0.0 1.0, 1.0 1.0)

Distance between polygons :
0.000000000000000000000000000000°

Overlaps? : false
Touches? : true
--------------------------------
Delta : 1.0e-15

Polygon #1 : POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))
Polygon #2 : POLYGON((0 1.000000000000001, 0 2, 1 2, 1 1.000000000000001, 0 1.000000000000001))

Intersection : GEOMETRYCOLLECTION EMPTY

Distance between polygons :
0.000000000000001110223024625157°

Overlaps? : false
Touches? : false

Those are well behaved polygons, with

  • sides parallel to the axes
  • same size
  • a shared side over the whole length

A difference of 1E-15 of a degree (about 1 Ångström) is enough to completely change the results, and both overlaps? and touches? switch between true and false.

The intersection is either empty, a line or a polygon, but at least the results seem consistent between intersection, overlaps? and touches?.

In your case, having more complex polygons with tilted sides makes the problem harder. When calculating intersection of two segments, it's hard to maintain 1-Ångström accuracy!

Is it a problem with RGeo?

RGeo uses GEOS, which is a C++ port of JTS (Java Topology Suite).

In order to check that the problem isn't specific to RGeo or GEOS, I calculated Example 1 with JTS 1.14 :

WKTReader wktReader = new WKTReader();
String wkt1 = "POLYGON ((-8226874.27782158 4962626.76394919, -8223358.174520462 4961756.817075645, -8223358.174520462 4960289.557693501, -8224471.369428394 4960289.557693501, -8226874.27782158 4962253.674727506, -8226874.27782158 4962626.76394919))";
String wkt2 = "POLYGON ((-8224757.546680832 4960523.476563589, -8225269.1002275925 4959296.105368667, -8226993.791361805 4959219.668340384, -8226420.900079966 4961883.087589158, -8224757.546680832 4960523.476563589))";
Polygon poly1 = (Polygon) wktReader.read(wkt1);
Polygon poly2 = (Polygon) wktReader.read(wkt2);
System.out.println("Intersection : " + poly1.intersection(poly2));
System.out.println("Overlaps?    : " + poly1.overlaps(poly2));
System.out.println("Intersects?  : " + poly1.intersects(poly2));
System.out.println("Touches?     : " + poly1.touches(poly2));
showMatrixWith(poly1, poly2);

It outputs :

Intersection : LINESTRING (-8224757.546680832 4960523.476563589, -8225598.074380083 4961210.51680879)
Overlaps?    : true
Intersects?  : true
Touches?     : false

212
101
212

The intersection is the exact same as in your example, intersects? and touches? output the same wrong results as RGeo.

Why are results inconsistent?

Why do intersection and touches? return inconsistend results?

DE-9IM

touches?, intersects?, overlaps? and other predicates are derived from the Dimensionally Extended nine-Intersection Model (DE-9IM). It's a matrix describing the dimensions of intersection between interiors, boundaries and exteriors of geometries.

This matrix is calculated on line 72 of src/operation/relate/RelateComputer.cpp in GEOS :

IntersectionMatrix* RelateComputer::computeIM()

The algorithm seems to require exact noding, which isn't the case in any of your examples.

All the tests that I could find for JTS used integers coordinates, even a test called "complex polygons touching" :

  # line 477 in jts-1.14/testxml/general/TestFunctionAA.xml
  <desc>mAmA - complex polygons touching</desc>
  <a>
    MULTIPOLYGON(
      (
        (100 200, 100 180, 120 180, 120 200, 100 200)),
      (
        (60 240, 60 140, 220 140, 220 160, 160 160, 160 180, 200 180, 200 200, 160 200,
        160 220, 220 220, 220 240, 60 240),
        (80 220, 80 160, 140 160, 140 220, 80 220)),
      (
        (280 220, 240 180, 260 160, 300 200, 280 220)))
  </a>
  <b>
    MULTIPOLYGON(
      (
        (80 220, 80 160, 140 160, 140 220, 80 220),
        (100 200, 100 180, 120 180, 120 200, 100 200)),
      (
        (220 240, 220 220, 160 220, 160 200, 220 200, 220 180, 160 180, 160 160, 220 160,
        220 140, 320 140, 320 240, 220 240),
        (240 220, 240 160, 300 160, 300 220, 240 220)))
  </b>

There's not a single test case where predicates are calculated for floating-point coordinates.

Note that even though wkt_3 and wkt_4 have rounded coordinates in your example, calculating their differences create polygons with inexact coordinates : x1 of diff_a is -8243077.837796713.

Geometry#intersection

Geometry#intersection is calculated on line 670 of src/operation/overlay/OverlayOp.cpp in GEOS :

void OverlayOp::computeOverlay(OverlayOp::OpCode opCode)

Comments in the code seems to indicate that exact noding isn't required, and there are multiple if statements to check if the results appear correct.

This method seems to be more robust than RelateComputer::computeIM.

Using GeometryPrecisionReducer?

With GeometryPrecisionReducer and a PrecisionModel, it becomes possible to specify a grid of allowable points for all Geometries.

GeometryPrecisionReducer is implemented in GEOS, but isn't available in RGeo. Tests in JTS with your first example have shown that it doesn't solve your problem anyway : inexact coordinates are snapped to the closest point on the grid. Every point is moved a bit to the north/south/east or west, which changes the slopes of every side.

It also changes the boundaries and their intersections. Depending on the PrecisionModel, the intersection for your first example could be empty, a line or a polygon.

1
Jadh4v On

In the first example, you have provided some floating point numbers for two polygons. What makes you think that they touch and do not overlap? How can you tell? In a generic case, you cannot say in absolute terms (without using tolerances) if two line segments overlap exactly or not. You can provide lines that have rounded coordinates, in which case an argument can be made for the same.

Use of floating-point numbers is a form of discretization of the real number space. This causes inconsistent results when using different algorithms or methods to compute the same result. For example, consider an intersection of three lines in 2D space such that they are supposed to intersect at the same point. Now compute the intersection point, independently, three times using a different pair of lines each time. You are likely to get similar but unequal answers each time.

I haven't used RGeo, but is there a way to adjust tolerances when computing intersection geometries? Try reducing the tolerance value (make that threshold smaller). This will allow the function to compute the geometry without merging points that are too close to each other.