PDF to ByteArray Conversion

1.3k views Asked by At

We have written a java code where we are trying to convert PDF to Bytearray.

But the problem is when we try to convert and try to print the converted output we get only 8 to 10 characters only .why is it so ? when i covert the whole pdf it has to be a large no of characters .

Here is my code:

public static void main(String[] args) 
    {

            FileInputStream in = new FileInputStream(new File("C:\\test\\P12.pdf"));
            FileOutputStream out = new FileOutputStream(new File("C:\\test\\pdfoutput.xml"));

                         byte[] buffer = new byte[1024];
            ByteArrayOutputStream bs = new ByteArrayOutputStream();
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1)
            {
                bs.write(buffer, 0, bytesRead);
            }
            System.out.println(in);
            byte[] bytes = bs.toByteArray();

                System.out.println(bs.toString());
        out.write(bytes);

} 
2

There are 2 answers

0
user207421 On

We have written a java code where we are trying to convert PDF to Bytearray.

No you haven't. You have written code that reads a file, without conversion, into a byte array. This is a bitwise copy operation, not a conversion.

But the problem is when we try to convert

There is no conversion here, other than the almost certainly invalid conversion of the ByteArrayOutputStream to a String.

and try to print the converted output we get only 8 to 10 characters only

You get junk. Binary junk. You get the original, unconverted, PDF, with all its binary characters, probably including lots of CR and BS characters. It isn't a valid operation. Solution: don't do it.

why is it so?

Because you haven't converted anything.

when i covert the whole pdf it has to be a large no of characters

No doubt, but you haven't converted anything yet. If you want see the text, use a PDF viewer, or write some code that uses a library like iText.

You have not yet begun to fight.

1
Joop Eggen On

A PDF is binary data. So a toString will probably just output the so called PDF signature, PDF + version + some intentionally non-ASCII chars.

As XML is even less likely.

There exists for instance the itext library for reading a PDF.

BTW in.close() would be a good idea too.