I have written below code that get result set using prepared statement. We are using impala DB with jdbc driver.
private void writeToCSV(File resultFile, ResultSet resultSet) throws IOException, SQLException {
LOG.info("Result set fetch size : {}", resultSet.getFetchSize());
try (BufferedWriter csvWriter = new BufferedWriter(new FileWriter(resultFile))) {
// Write headers
int columnCount = resultSet.getMetaData().getColumnCount();
for (int i = 1; i <= columnCount; i++) {
csvWriter.write(resultSet.getMetaData().getColumnName(i));
if (i < columnCount) {
csvWriter.write(",");
}
}
csvWriter.write("\n");
// Write data
int batchSize = 500; // You can adjust this based on your needs
int currentCount = 0;
while (resultSet.next()) {
for (int i = 1; i <= columnCount; i++) {
csvWriter.write(resultSet.getString(i));
if (i < columnCount) {
csvWriter.write(",");
}
}
csvWriter.write("\n");
currentCount++;
// Write to the file in chunks
if (currentCount % batchSize == 0) {
csvWriter.flush();
}
}
// Flush any remaining data
csvWriter.flush();
}
}
I want to understand, I have million records in memory, will record set load all of them at a time from impala DB or in batch. Also if it loads next batch, will previous record remain in memory or they will be removed.
Do I need to set the fetchsize. If result.getFetchSize always returns zero.
I tried executing this with 1 million records and saw huge spike in memory. If result works in batching, we should not have seen hugh spike in memory.