Dremio SQL injection vulnerability

229 views Asked by At

I'd like to query an S3 storage containing parquet files through my Spring Java app with Dremio. These are dynamic queries with user given parameters. I use Apache Arrow SQl driver and simply run queries by a JdbcTemplate instantiated with a DataSource from the following properties:

  driver-class-name: org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver
  url: jdbc:arrow-flight-sql://localhost:32010/?useEncryption=false
  username: user
  password: pwd

For the sql string I use a String formatted with the user given value:

"SELECT * FROM "my-s3-storage".table t WHERE t.description = '%s';".formatted(userInput)

It works well but needless to say how big of an SQL injection opportunity it is. If I try to use a Prepared Statement:

String sql = "SELECT * FROM "my-s3-storage".table t WHERE t.description = ?"
jdbcTemplate.query(sql, ps -> ps.setString(1, userInput), rs -> {
  //handling the result set
});

I get the following error:

cfjd.org.apache.arrow.flight.FlightRuntimeException: Cannot convert RexNode to equivalent Dremio expression. RexNode Class: org.apache.calcite.rex.RexDynamicParam, RexNode Digest: ?0

I'm finding ambiguous information on the web. The claim that Dremio doesn't support prepared statements could be found in some forums, but all of these comments are several years old*, moreover the official Dremio site has an article recommending the usage of prepared statements..

As far as I know Dremio uses ANSI SQL under the hood, which I believe supports prepared statements. Or does that depend on the database engine and not the dialect? Can anyone confirm that it is still not possible with Dremio? Then I'll stop pursuing it further.

If that's the case, I will escape the unsafe characters, use a dictionary for encoding and decoding user given characteres, etc. If you have some other advices or experiences mitigating SQL injection without prepared statements though, I would appreciate them as well!

Thank you!

*latest update I found: https://community.dremio.com/t/sql-parameterization-support/1733/5

1

There are 1 answers

0
Bylaw On

I'll post my findings as an answer, maybe it could be useful for someone in the same boat:

In lack of any other ideas I went down the encode/decode road.

For that the initial idea was to use an own dictionary, but I figured that hexadecimal encoding should be sufficient enough. Luckily Dremio SQL has a FROM_HEX function, which can return a BINARY value for the given hexadecimal string.

With that I can (so far..) safely build dynamic queries with any kind of user given input turned into HEX strings and then converting back at execution time, with the function behaving kind of like a wrapper.

Still, this isn't a perfectly calming solution, but as far as I can see, there isn't really a better option at the moment. I heard Dremio will implement prepared statements (quite a shock that they haven't already), until then, let's hope for the best!