How can we extract text content from PDF file, we are using pdfbox to extract text from PDF file but we are getting header and footer is not required. I am using following java code.
PDFTextStripper stripper = null;
  try {
    stripper = new PDFTextStripper();
   } catch (Exception e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
   }
     stripper.setStartPage(pageCount);
     stripper.setEndPage(pageCount);
   try {
      String pageText = stripper.getText(document);
       System.out.println(pageText);  
    } catch (Exception e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
 }
 
                        
You have tagged this as an itext/itextpdf question, yet you are using PdfBox. That's confusing.
You also claim that your PDF file has headers and footers. This would imply that your PDF is a Tagged PDF and that the header and the footer are marked as artifacts. If that is the case, than you should take advantage of the Tagged nature of the PDF, and extract the PDF as is done in the ParseTaggedPdf example:
If this doesn't result in anything, you clearly don't have a Tagged PDF in which case there are no headers and footers in your document from a technical point of view. You may see headers and footers with your human eyes, but that doesn't mean that a machine sees these headers and footers. To a machine, it's just text like any other text in the page.
The ExtractPageContentArea example shows how we can define a rectangle that excludes the header and the footer when parsing for the content.
In this case, we have examined the document manually and we noticed that the actual text is always added inside the rectangle
new Rectangle(70, 80, 490, 580). The header is added above Y coordinate 580 and below coordinate 80. By using theRegionTextRenderFilterwe can extract the content excluding the content that doesn't overlap with the rectangle we have defined.