How to obtain an InputStream when opening an IgnitePath (returns HadoopIgfsSecondaryFileSystemPositionedReadable)?

128 views Asked by At

Usually, when working with Hadoop and Flink, opening/reading a file from a distributed file system will return a Source (counterpart of Sink) object extending the java.io.InputStream.

However, in Apache Ignite, the IgfsSecondaryFileSystem, and more specifically the IgniteHadoopIgfsSecondaryFileSystem, returns an object of type HadoopIgfsSecondaryFileSystemPositionedReadable when calling their "open" method (by passing an IgfsPath).

HadoopIgfsSecondaryFileSystemPositionedReadable offers a "read" method but requires to know details on where the data, which is intended to be read, is located, such as the input stream position.

/**
 * Read up to the specified number of bytes, from a given position within a file, and return the number of bytes
 * read.
 *
 * @param pos Position in the input stream to seek.
 * @param buf Buffer into which data is read.
 * @param off Offset in the buffer from which stream data should be written.
 * @param len The number of bytes to read.
 * @return Total number of bytes read into the buffer, or -1 if there is no more data (EOF).
 * @throws IOException In case of any exception.
 */
public int read(long pos, byte[] buf, int off, int len) throws IOException;

How to determine these details before calling the read method?

I am quite new to these frameworks and maybe there exists a different way to obtain an InputStream based on an IgfsPath pointing to a file stored in a Hadoop file system?

I am trying to achieve what is described here: https://apacheignite-fs.readme.io/docs/secondary-file-system

Thanks in advance for any hint !

1

There are 1 answers

0
Denis Mekhanikov On

IgfsSecondaryFileSystem interface is not supposed to be used directly. You can configure your Hadoop cluster to be used as a secondary FS for read-through and write-through operations.

IgfsSecondaryFileSystem should only be specified in configuration as FileSystemConfiguration#secondaryFileSystem property.

You should use IgniteFileSystem interface instead. You can get an instance of it by calling Ignite#fileSystem(...) method. To acquire an InputStream by IGFS path, you can use IgniteFileSystem#open(...) method.