I have been developing an app which takes a PDF and then it strips down all its images and stores them as ArrayList of bitmaps. These bitmaps can then further be edited and then saved as PDF. When I am trying to save them as PDF after editing or just without editing the PDF size becomes ten times the original PDF size and though I have made each page processing on a different thread it stills runs very slow.

for example: If I take a PDF of size 28 MB it takes somewhere in the neighborhood of 4 minutes to make them back into PDF. There are 20 images in the PDF and the output PDF size is above 200 MB.

I am using the PDFBox library from Tom Roush for android. Tom Roush PDFBox Repo.

This is the createPdf() method :

public void createPdf() {
        document = new PDDocument();

            for(Bitmap image : images)
            {
                PDFPages page=new PDFPages();
                page.execute(image);
            }

    }

The asynctask PDFPages class is as follows :

public class PDFPages extends AsyncTask<Bitmap,Integer,Void>
    {

        @Override
        protected Void doInBackground(Bitmap... voids) {

            try {
                Bitmap image=voids[0];
                PDPage page = new PDPage();
                document.addPage(page);
                // Define a content stream for adding to the PDF
                PDPageContentStream contentStream = new PDPageContentStream(document, page);

                PDImageXObject ximage = LosslessFactory.createFromImage(document, image);

                // Defining and calculating position and scaling variables
                float w = image.getWidth();
                float h = image.getHeight();

                float x_pos = page.getCropBox().getWidth();
                float y_pos = page.getCropBox().getHeight();


                if (w > h) {
                    h = h * (x_pos / w);
                    w = x_pos;
                } else {
                    w = w * (y_pos / h);
                    h = y_pos;
                }

                float x_adjusted = (x_pos - w) / 2;
                float y_adjusted = (y_pos - h) / 2;

                contentStream.drawImage(ximage, x_adjusted, y_adjusted, w, h);

                // Make sure that the content stream is closed:
                contentStream.close();
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
            return null;
        }

        @Override
        protected void onPostExecute(Void aVoid) {
            super.onPostExecute(aVoid);

            countPages = countPages + 1;

            if(countPages == images.size()) {
                try {
                    // Save the final pdf document to a file
                    final String path = myDir.getAbsolutePath() + "/Created.pdf";

                    document.save(path);
                    document.close();

                    Toast.makeText(process.this, "PDF successfully written to :" + path, Toast.LENGTH_SHORT).show();
                } catch (Exception e) {
                    e.printStackTrace();
                }

                progressBar.setVisibility(View.INVISIBLE);
                saving.setVisibility(View.INVISIBLE);
                anim.cancel();

            }

        }
    }

The method used for extracting images out of the PDF is as follows :


public void createImages()
    {
        try {
            //Loading the pdf file
            PDDocument document = PDDocument.load(file);
            //Getting all the pages in list
            PDPageTree pages= document.getDocumentCatalog().getPages();
            Iterator iter = pages.iterator();

            myDir = new File(root.getAbsolutePath(), "PDF/" + pdfName);
            if (!myDir.exists()) {
                myDir.mkdirs();
            }

            // i used for counting number of images
            i=0;

            while(iter.hasNext())
            {
                PDPage page=(PDPage) iter.next();
                PDResources resources=page.getResources();

                //Tom Roush code that he commented against my issue of not having resources.getImages() method
                for (COSName name : resources.getXObjectNames())
                {
                    PDXObject xobj = resources.getXObject(name);
                    if (xobj instanceof PDImageXObject)
                    {
                        bit = ((PDImageXObject)xobj).getImage();
                        //Image acquired.
                        if(bit != null) {
                            images.add(bit);
                        }
                        i=i+1;
                    }
                }
            }
            if(i == 0)
            {
                Intent intent=new Intent(process.this,MainActivity.class);
                intent.putExtra("images",i);
                startActivity(intent);
            }
            document.close();
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
        Log.i("helll","Completed CreateImages()");
    }

images is the ArrayList of bitmaps.

The input PDF is the PDF made by Cam Scanner (the app) with the use of 20 images taken by the device's camera. It has a size of 27.45 MB and the output PDF has a size of 264.10 MB

I will upload the PDFs shortly. The reason why I can't upload: I am currently away from my workspace and I am entirely dependent on my phone's internet and yes I live in a third world country. So I will upload the PDFs in my google drive and edit in the links as soon as I get some decent internet connection.

I want some method by which I can lower the time of outputting and the size of the outputted PDF.

0 Answers