How to scrape data from MEPCO duplicate bill checking site in PHP?

Question

How to scrape data from MEPCO duplicate bill checking site in PHP?

247 views Asked by Ahtisham Ali At 19 April 2023 at 17:06

I'm trying to scrape data from the MEPCO bill site in PHP. Specifically, I want to extract the bill details and save them to a database.

Here's an example of the HTML structure from where I want to scrape data:

<html>
  <body>
    <div id="bill-details">
      <h2>Electricity Bill Details</h2>
      <p>Payable Amount: $200</p>
      <p>Due Date: 2023-05-01</p>
      <p>Description: This is your electricity bill for the month of April 2023.</p>
    </div>
  </body>
</html>

I want to extract the payable amount and due date from this HTML. Here's the code I have tried so far:

$html = '<html>...'; // the HTML from the example above
preg_match('/<h2>(.*)<\/h2>/', $html, $billHeading);
preg_match('/<p>Payable Amount: (.*)<\/p>/', $html, $payableAmount);
preg_match('/<p>Due Date: (.*)<\/p>/', $html, $dueDate);
echo "Bill Heading: ".$billHeading[1];
echo "Payable Amount: ".$payableAmount[1];
echo "Due Date: ".$dueDate[1];

However, this code is not working as expected. It is not extracting the correct payable amount and due date. Can someone please help me correct this code or suggest a better way to extract the data from HTML using PHP?

Original Q&A

There are 1 answers

**Tonsil** · Answer 1 · 2023-04-19T19:59:09+00:00

Your example seems to work, as best as I can tell. This is what I ran:

<?php

$html = <<<HTML
<html>
  <body>
    <div id="bill-details">
      <h2>Electricity Bill Details</h2>
      <p>Payable Amount: $200</p>
      <p>Due Date: 2023-05-01</p>
      <p>Description: This is your electricity bill for the month of April 2023.</p>
    </div>
  </body>
</html>
HTML;

preg_match('/<h2>(.*)<\/h2>/', $html, $billHeading);
preg_match('/<p>Payable Amount: (.*)<\/p>/', $html, $payableAmount);
preg_match('/<p>Due Date: (.*)<\/p>/', $html, $dueDate);
echo "Bill Heading: '".$billHeading[1] . "'\n";
echo "Payable Amount: '".$payableAmount[1] ."'\n";
echo "Due Date: '".$dueDate[1] ."'\n";

And that generates this result:

Bill Heading: 'Electricity Bill Details'
Payable Amount: '$200'
Due Date: '2023-05-01'

Without knowing what exactly isn't working, it's hard to say what the problem is. As far as ways to improve it, one of the other comments suggested using a specialized DOM parsing library, and I agree with that. If you must rely on regexes, I suggest making the patterns as specific as possible. For example, if the date is always that format, match using something like (\d{4}-\d{2}-\d{2}).

TechQA.

How to scrape data from MEPCO duplicate bill checking site in PHP?

There are 1 answers

Related Questions in PHP

Related Questions in NSREGULAREXPRESSION

Popular Questions

Trending Questions