I download the ClueWeb09_English_Sample.warc file from this page then I write the data of the warc file on a text file by using the given code of the following web page. I want to parse the text file to achieve to the content of the pages in the text file but I do not know how should I parse it.Is there any way to parse a warc file without convert it to text?
I want to parse the following text:
WARC/0.18
WARC-Type: warcinfo
WARC-Date: 2009-04-119T12:48:17-0400
WARC-Record-ID: d4360e52-06c3-41c8-bb13-62db3a622ca7
Content-Type: application/warc-fields
Content-Length: 218
software: Nutch 1.0-dev (modified for clueweb09)
isPartOf: clueweb09-
description: clueweb09 crawl with WARC output
format: WARC file version 0.18
conformsTo: http://www.archive.org/documents/WarcFileFormat-0.18.html
WARC/0.18
WARC-Type: response
WARC-Date: 2009-03-67T15:35:34-0700
WARC-Identified-Payload-Type:
WARC-TREC-ID: clueweb09-en0040-54-00000
WARC-Target-URI: http://www.smartwebby.com/DreamweaverTemplates/templates/business_general_template59.asp
WARC-Warcinfo-ID: d4360e52-06c3-41c8-bb13-62db3a622ca7
WARC-Record-ID: <urn:uuid:721f9a28-6b9a-44c1-bccd-8c7accb514cd>
Content-Type: application/http;msgtype=response
Content-Length: 21064
HTTP/1.1 200 OK
Content-Type: text/html
X-Powered-By: ASP.NET
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
Cache-control: private
Date: Fri, 30 Jan 2009 18:08:20 GMT
Connection: close
Set-Cookie: COOtempname=; path=/
Content-Length: 20807
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<html><!-- InstanceBegin template="/Templates/dreamweaver_template.dwt.asp" codeOutsideHTMLIsLocked="false" --><head><!-- InstanceBeginEditable name="doctitle" -->
<title>Template 59 [Business/General] - Sharp Business Template</title><!-- InstanceEndEditable --><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><link rel="stylesheet" type="text/css" href="/styles.css"><link rel="stylesheet" type="text/css" href="/templates.css"><!-- InstanceBeginEditable name="pagemetas" -->
<META content="Category: Business Templates, General Web Templates. Sharp Simple design with warm colors and neat navigation" name="Description">
<META content="Template 59 Business/General Business Templates General Web Templates Sharp Business Template Sharp Simple design with warm colors neat navigation" name="Keywords"><!-- InstanceEndEditable --></head>
<body><div id="header"><a href="http://www.smartwebby.com/"><img src='/images/new/smartwebby_logo.gif' width='206' height='40' alt='Best Web Design, Global Web Designers in Chennai, India' hspace="20" vspace="8" border="0"></a></div>
<div id="hnav" align="right"><div class="bnav"><form action="http://www.smartwebby.com/site_search.asp" id="cse-search-box" class="gsearch"><div><input type="hidden" name="cx" value="partner-pub-7253144749736841:opeytdpqnvq"><input type="hidden" name="cof" value="FORID:9"><input type="hidden" name="ie" value="ISO-8859-1"><input type="text" name="q" size="15"> <input type="submit" name="sa" value="Search"></div></form></div><a href="/clientarea/default.asp" class="bnav"><img src="/images/client_login.jpg" width="40" height="40" border="0" alt="Client Login" vspace="5"></a><a href="https://www.smartwebby.com/view_cart.asp" class="bnav"><img src="/images/view_cart.jpg" width="40" height="40" border="0" alt="View your Shopping Cart" vspace="5"></a><a href="/DreamweaverTemplates/faqs.asp" class="bnav"><img src="/images/help_faqs.jpg" width="40" height="40" border="0" alt="Help & Frequently Asked Questions" vspace="5"></a><a href="mailto:[email protected]" class="bnav"><img src="/images/email_us.jpg" width="40" height="40" border="0" alt="Email Us" vspace="5"></a></div><div id='htabs'><div id='tabbg'><a href='/' class='tab ml'>Home</a> <a href='/services.asp' class='tab ml'>Services</a> <a href='/portfolio.asp' class='tab ml'>Portfolio</a> <a href='/rates.asp' class='tab ml'>Web Design Pricing</a> <span class='tabOn ml'><a href='/DreamweaverTemplates/'>Dreamweaver Templates</a> </span><a href='/web_applications.asp' class='tab ml'>Web Applications</a> <a href='/resources.asp' class='tab ml'>Free Tutorials</a> <a href='/about_us.asp' class='tab ml'>About</a> <a href='/contactus.asp' class='tab ml'>Contact Us</a> <span class='tab nopad'></span></div></div><div id='tablinks'><a href='/DreamweaverTemplates/about.asp'>About our Dreamweaver Templates</a> | <a href='/DreamweaverTemplates/buy_templates.asp'>How to Buy</a> | <a href='/DreamweaverTemplates/'>Catalog</a> | <a href='/DreamweaverTemplates/terms_of_use.asp'>Terms of Use</a> | <a href='/DreamweaverTemplates/customization_guide.asp'>Customization Help</a> | <a href='/DreamweaverTemplates/TemplateCustomizationService.asp'>Customization Service</a> | <a href='/DreamweaverTemplates/FreeDreamweaverTemplates.asp'>Free Dreamweaver Templates</a> | <a href='/DreamweaverTemplates/terms_of_use_free.asp'>Terms of Use (Free)</a></div><div id='hmenu'><div class='mrow'><a href='/DreamweaverTemplates/BeautyFashionTemplates.asp'>Beauty Templates</a> | <a href='/DreamweaverTemplates/BusinessTemplates.asp' class='onpage'>Business Templates</a> | <a href='/DreamweaverTemplates/ChurchChristianTemplates.asp'>Christian Templates</a> | <a href='/DreamweaverTemplates/CSSTemplates.asp'>CSS Templates</a> | <a href='/DreamweaverTemplates/EducationTemplates.asp'>Education Templates</a> | <a href='/DreamweaverTemplates/FamilyPersonalTemplates.asp'>Family Templates</a> | <a href='/DreamweaverTemplates/FlashTemplates.asp'>Flash Templates</a> | <a href='/DreamweaverTemplates/FreeDreamweaverTemplates.asp'>Free Dreamweaver Templates</a></div><div class='mrow'><a href='/DreamweaverTemplates/FoodTemplates.asp'>Food Templates</a> | <a href='/DreamweaverTemplates/GeneralWebTemplates.asp'>General Templates</a> | <a href='/DreamweaverTemplates/GovernmentMilitaryTemplates.asp'>Government Templates</a> | <a href='/DreamweaverTemplates/HealthMedicalTemplates.asp'>Health/Medical Templates</a> | <a href='/DreamweaverTemplates/HiTechTemplates.asp'>Hi-Tech Templates</a> | <a href='/DreamweaverTemplates/KidsChildcareTemplates.asp'>Kids/Childcare Templates</a> | <a href='/DreamweaverTemplates/LowCostTemplates.asp'>Low-cost Budget Templates</a></div><div class='mrow'><a href='/DreamweaverTemplates/PersonalWebTemplates.asp'>Personal Templates</a> | <a href='/DreamweaverTemplates/PetsTemplates.asp'>Pets Templates</a> | <a href='/DreamweaverTemplates/PhotographyTemplates.asp'>Photography Templates</a> | <a href='/DreamweaverTemplates/ProfessionalsTemplates.asp'>Profession Templates</a> | <a href='/DreamweaverTemplates/RealEstateTemplates.asp'>Real Estate Templates</a> | <a href='/DreamweaverTemplates/SportsTemplates.asp'>Sports Templates</a> | <a href='/DreamweaverTemplates/TelecomTemplates.asp'>Telecom Templates</a> | <a href='/DreamweaverTemplates/TravelTemplates.asp'>Travel Templates</a></div></div><div id='hlinks'><a href='/services.asp'>Professional Web Design Services</a> <a href='/design_packages.asp'>Web Design Packages</a> <a href='/professional_logo_designing.asp#packages'>Logo Design Packages</a> <a href='/DreamweaverTemplates/'>Dreamweaver Web Templates</a> <a href='/web_site_design/default.asp'>Web Design Guide</a> <a href='/web_site_design/web_design_tools.asp'>Best Web Design Software</a></div>
<div id="content" class="text"><!-- InstanceBeginEditable name="tempinfo" -->
<h1>Template 59 - Business/General</h1><!-- InstanceEndEditable -->
<div class="picboxL" align="center"><div class="red">Template HTML View Screenshot - 1024px screen width</div><img src="/images/spacer.gif" alt="" width="1" height="15" hspace="0" vspace="0"><br><!-- InstanceBeginEditable name="1024view" --><img src="/images/dreamweaver_templates/HTML_1024_view/temp59_business_general.gif" width="400" height="290" alt="Template 59 [Business/General] - 1024px screen width view"><!-- InstanceEndEditable --><br><a href="javascript:;" onClick="OpenTemplate('temp59_business_general')"><img src="/images/800res.gif" width="184" height="27" border="0" alt="View for 800x600 Resolution"></a> <a href="javascript:;" onClick="OpenTemplate2('temp59_business_general')" class="link"><img src="/images/1024res.gif" width="184" height="27" border="0" alt="View for 1024x768 Resolution"></a></div>
<div class="whiteboxR"><div align="center"><strong> Preview sample web page : </strong> <a href="javascript:;" onClick="OpenTemplate('temp59_business_general')" class="link">For 800x600 Resolution</a> | <a href="javascript:;" onClick="OpenTemplate2('temp59_business_general')" class="link">For 1024x768 Resolution</a></div><div class="curveboxT"><div class="curveboxL"><div class="curveboxR"><div class="curveboxTR"><img src="/images/home/box_t.jpg" alt="" width="51" height="56" align="bottom"></div><div class="curveboxC"><!-- InstanceBeginEditable name="features" --><span class="red">Key Features of this Sharp Business Template
<div id="certs" align="center"><img src='/images/icons/w3c_html_valid.gif' alt='W3C Certified: Valid HTML 4.01 Transitional' width='52' height='26'> <img src='/images/icons/top_browsers_tested.gif' alt='Cross Browser Compatible: Tested in IE 5+, Firefox 1+, Opera 7+, Netscape 6+, Safari 3' width='52' height='26'> <img src='/images/icons/drop_down_menus.gif' alt='Javascript Drop-Down Menus' width='52' height='26'> <img src='/images/icons/stretch_layout.gif' alt='Stretch Layout to fit all screen resolutions' width='52' height='26'> <img src='/images/icons/text_links_nav.gif' alt='Text Links Navigation' width='52' height='26'> </div></span>
<ul>
<li>Sharp Simple design with warm colors and neat
navigation</li>
<li>Easy-to-edit Drop-down Menus & Text Links </li>
<li>All <b>16</b> linked HTML pages included</li>
<li>Cross Browser Compatible : <span class='bluelk'>Tested for Internet Explorer 5+, Netscape 6+, Opera 7+, Firefox 1.0+, Safari 3</span></li>
<li>Designed to stretch and fit all resolutions (800 x 600 and higher screen resolutions)</li>
</ul>
<!-- InstanceEndEditable -->
Buy Now for Only <strong>$39.95</strong>! <a href="/addtocart.asp?pid=198"><img src="/images/add2cart.gif" alt="Add to Cart" width="148" height="38" border="0" align="middle"></a><br><a href="#software" class="link">Software Required</a> <a href="javascript:;" onClick="OpenHelp('software_req')" class="super">[?]</a> <a href="#zip" class="link">Source Files Included</a> <a href="javascript:;" onClick="OpenHelp('source_files')" class="super">[?]</a> <a href='/DreamweaverTemplates/BusinessTemplates.asp' class='link'>More Business Templates</a></div><div class="curveboxB"><div class="curveboxBR"><img src="/images/home/box_l.jpg" width="51" height="30" alt=""></div></div></div></div></div></div>
<div class="bluebox"><div class="bluesub">Why buy our <a href="/DreamweaverTemplates/" target="_blank" class="bluesub">High-Quality Professional Dreamweaver Templates</a>?</div>
<ul class="bullet"><li>Save time & money! Choose from a variety of website designs to find the perfect ready-to-use Adobe Dreamweaver & Fireworks Template for your site. </li><li>Our Dreamweaver Templates are Cross Browser Compatible, Optimized for low load-time and W3C Standard Compliant (valid CSS & HTML code).</li><li>Each dreamweaver template download comes with an easy-to follow Customization Guide that will help to get your web site up within a couple of days!</li> <li> Fully automated purchase process - Buy and download your Dreamweaver Template instantly on your credit card purchase approval!<script type="text/javascript" language="JavaScript" src="/DreamweaverTemplates/scripts.js"></script></li></ul></div><p align="center" class="red">Template HTML View - Actual Size Screenshot for 800px screen width</p><div align="center"><!-- InstanceBeginEditable name="800view" --><img src="/images/dreamweaver_templates/HTML_view/temp59_business_general.gif" width="790" height="579" alt="Dreamweaver Template 59 [Business/General] - Actual Size Screenshot for 800px screen width"><!-- InstanceEndEditable --></div><div align="center"><br><strong>Template 59 [Business/General] HTML sample web page Screenshot</strong></div><span class="red">Please Note:</span> The above image has been optimized for lower GIF file size, hence some parts of it may look blurred or distorted. View the <a href="javascript:;" onClick="OpenTemplate('temp59_business_general')">template sample web page</a> to look at the actual optimized template graphics without any distortion. <!-- InstanceBeginEditable name="pagedesc" -->In the sample page, mouseover the top horizontal text links to view the drop-down menus effect.<!-- InstanceEndEditable --><br><br><div class="bluebox"><!-- InstanceBeginEditable name="software_zip" --><a name="software"></a><span class="bluesub">Software Required for the customization of Template 59 [Business/General]:</span>
<ul class="bullet">
<li>Adobe Dreamweaver (MX 2004 or above)</li>
<li>Adobe Fireworks (MX 2004 or above)<a name="zip"></a></li>
</ul>
<span class="bluesub">The Zip Download for Template 59 [Business/General] includes the following files:</span>
<ul class="bullet">
<li> The web site design layout source (Fireworks .PNG file)</li>
<li>The Dreamweaver web template (Dreamweaver .DWT file)</li>
<li>All <strong>16</strong> HTML pages shown as links in the template <a href="javascript:;" onClick="OpenTemplate('temp59_business_general')">sample web page</a> (Dreamweaver .HTM files)</li>
<li>The external cascading style sheet (.CSS file)</li>
<li>JavaScript files for DHTML effects like drop-down menus, slideshows, etc. (.JS files) </li>
<li>All graphics "as is" viewable in the template <a href="javascript:;" onClick="OpenTemplate('temp59_business_general')">sample web page</a> (optimized .GIF & .JPG files) </li>
<li>Our easy-to-follow Customization Guide and the End User License Agreement (EULA)</li>
<li>A font folder that includes the fonts used in the template Fireworks layout</li>
</ul>
<!-- InstanceEndEditable --></div>
<div class="picboxDF" align="center"><p class="red">Template Adobe Dreamweaver View - index.htm page</p><!-- InstanceBeginEditable name="DWview" --><img src="/images/dreamweaver_templates/dreamweaver_view/temp59_business_general.gif" width="400" height="290" alt="Template 59 [Business/General] - Adobe Dreamweaver View"><!-- InstanceEndEditable --><p>Template 59 [Business/General] Dreamweaver Screenshot </p></div>
<div class="picboxDF" align="center"><p class="red">Template Adobe Fireworks View - template.png source file </p><!-- InstanceBeginEditable name="FWview" --><img src="/images/dreamweaver_templates/fireworks_view/temp59_business_general.gif" width="400" height="290" alt="Template 59 [Business/General] - Adobe Fireworks View"><!-- InstanceEndEditable --><p>Template 59 [Business/General] Fireworks Screenshot </p></div><h4 align="center" class="clear100">SmartWebby.com Dreamweaver Templates - Categories</h4>
<div align="center" class="grey"><a href="/DreamweaverTemplates/BeautyFashionTemplates.asp" class="grey">Beauty Templates</a> | <a href="/DreamweaverTemplates/BusinessTemplates.asp" class="grey">Business Templates</a> [Pg: <a href="/DreamweaverTemplates/BusinessTemplates.asp" class="grey">1</a>, <a href="/DreamweaverTemplates/DreamweaverBusinessTemplates.asp" class="grey">2</a>, <a href="/DreamweaverTemplates/business_templates.asp" class="grey">3</a>, <a href="/DreamweaverTemplates/dreamweaver_business_templates.asp" class="grey">4</a>] | <a href="/DreamweaverTemplates/ChurchChristianTemplates.asp" class="grey">Christian Templates</a> | <a href="/DreamweaverTemplates/CSSTemplates.asp" class="grey">CSS Templates (tableless)</a> [Pg: <a href="/DreamweaverTemplates/CSSTemplates.asp" class="grey">1</a>, <a href="/DreamweaverTemplates/DreamweaverCSSTemplates.asp" class="grey">2</a>] | <a href="/DreamweaverTemplates/EducationTemplates.asp" class="grey">Education Templates</a>
<br>
<a href="/DreamweaverTemplates/FamilyPersonalTemplates.asp" class="grey">Family/Personal Templates</a> [Pg: <a href="/DreamweaverTemplates/FamilyPersonalTemplates.asp" class="grey">1</a>, <a href="/DreamweaverTemplates/FamilyTemplates.asp" class="grey">2</a>] | <a href="/DreamweaverTemplates/FlashTemplates.asp" class="grey">Flash Templates</a> | <a href="/DreamweaverTemplates/FoodTemplates.asp" class="grey">Food Templates</a> | <a href="/DreamweaverTemplates/FreeDreamweaverTemplates.asp" class="grey">Free Dreamweaver Templates</a> | <a href="/DreamweaverTemplates/GeneralWebTemplates.asp" class="grey">General Templates</a> | <a href="/DreamweaverTemplates/GovernmentMilitaryTemplates.asp" class="grey">Government Templates</a>
<br>
<a href="/DreamweaverTemplates/HealthMedicalTemplates.asp" class="grey">Health/Medical Templates</a> | <a href="/DreamweaverTemplates/HiTechTemplates.asp" class="grey">Hi-Tech Templates
</a> | <a href="/DreamweaverTemplates/KidsChildcareTemplates.asp" class="grey">Kids Templates</a> | <a href="/DreamweaverTemplates/LowCostTemplates.asp" class="grey">Low-cost/Budget Templates</a> [Pg: <a href="/DreamweaverTemplates/LowCostTemplates.asp" class="grey">1</a>, <a href="/DreamweaverTemplates/LowCostBudgetTemplates.asp" class="grey">2</a>] | <a href="/DreamweaverTemplates/PersonalWebTemplates.asp" class="grey">Personal Web Templates</a> | <a href="/DreamweaverTemplates/PetsTemplates.asp" class="grey">Pets/Animals Templates</a>
<br>
<a href="/DreamweaverTemplates/PhotographyTemplates.asp" class="grey">Photography Templates</a> | <a href="/DreamweaverTemplates/ProfessionalsTemplates.asp" class="grey">Professionals Templates</a> | <a href="/DreamweaverTemplates/RealEstateTemplates.asp" class="grey">Real Estate Templates</a> | <a href="/DreamweaverTemplates/SportsTemplates.asp" class="grey">Sports Templates</a> | <a href="/DreamweaverTemplates/TelecomTemplates.asp" class="grey">Telecom Templates</a> | <a href="/DreamweaverTemplates/TravelTemplates.asp" class="grey">Travel Templates</a></div>
</div><div id="clearfloats"></div><div id="fmenu">
<div class='mrow'><strong><a href='/services.asp'>Services</a></strong> > <a href='/web_services/design.asp'>CSS Web Design</a> | <a href='/professional_logo_designing.asp'>Professional Logo Design</a> | <a href='/web_services/web_programming.asp'>ASP.net, ASP & PHP Programming</a> | <a href='/web_services/flash_animation_programming.asp'>Flash Animation & Programming</a> | <a href='/affordable_web_hosting_plans.asp'>Reliable Web Hosting</a> | <a href='/website_maintenance_packages.asp'>Website Maintenance</a></div>
<div class='mrow'><strong><a href='/portfolio.asp'>Portfolio</a></strong> > <a href='/design_portfolio.asp'>Web Design Portfolio</a> | <a href='/programming_portfolio.asp'>Web Programming Portfolio</a> | <a href='/logo_design_portfolio.asp'>Print & Logo Design Portfolio</a> | <a href='/flash_portfolio.asp'>Flash Animation Portfolio</a> | <a href='/outsource_portfolio.asp'>Outsource Clients Portfolio</a> | <a href='/client_quotes.asp'>Client Testimonials</a></div>
<div class='mrow'><strong><a href='/rates.asp'>Web Design Pricing</a></strong> > <a href='/rates.asp'>Design Rates</a> | <a href='/design_packages.asp'>Custom Web Design Pricing</a> | <a href='/professional_logo_designing.asp#packages'>Logo Design Pricing</a> | <a href='/professional_logo_designing.asp#bcl'>Business Card & Letterhead Pricing</a> | <a href='/affordable_web_hosting_plans.asp#windows'>Web Hosting Plans</a> | <a href='/website_maintenance_packages.asp#packages'>Website Maintenance Plans</a></div>
<div class='mrow'><strong><a href='/web_applications.asp'>Web Applications</a></strong> > <a href='/web_products/flash_survey/default.asp'>Smart Survey</a> | <a href='/web_products/flash_poll/multi_poll.asp'>Smart Multi Poll</a> | <a href='/web_products/flash_poll/default.asp'>Smart Poll</a> | <a href='/web_products/flash_guestbook/default.asp'>Smart Guest Book (ASP</a>/<a href='/web_products/PHP/flash_guestbook/default.asp'>PHP</a>) | <a href='/web_products/instant_quote/default.asp'>Smart Quote</a> | <a href='/web_site_design/free_web_tools.asp'>Free Web Applications</a> | <a href='/custom_flash_web_applications.asp'>Custom Flash Applications</a></div>
<div class='mrow'><strong><a href='/resources.asp'>Free Tutorials</a></strong> > <a href='/web_site_design/default.asp'>Web Design Tutorials</a> | <a href='/Flash/default.asp'>Flash Tutorials</a> | <a href='/web_site_design/dreamweaver_template.asp'>Dreamweaver Tutorials</a> | <a href='/web_site_design/dreamweaver_template.asp'>Fireworks Tutorials</a> | <a href='/website_promotion/default.asp'>SEO & Promotion Tutorials</a> | <a href='/DHTML/default.asp'>Javascript Tutorials</a> | <a href='/PHP/default.asp'>PHP MySQL Tutorials</a></div>
</div>
<div id="footer"><div id="footerT" align="center" class="bluedk">Copyright © 2001-2008 Jandus Technologies - <a href="http://www.smartwebby.com/">www.smartwebby.com</a> - All Rights Reserved. <a href="/privacy_policy.asp">Privacy Policy</a> | <a href="/site_map.asp">Site Map</a> | <a href="#">Page Top</a> <div id="footerR"><img src="/images/new/jandus_technologies_logo.gif" width="127" height="48" alt="Jandus Technologies logo"></div></div>
<div id="footerB"> <img src="/images/new/w3c_css.gif" alt="Valid CSS!" width="53" height="22" align="middle" border="0"> <img src="/images/new/w3c_html.gif" alt="Valid HTML 4.01 Transitional" width="53" height="22" align="middle" border="0"></div></div><script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script><script type="text/javascript">
var pageTracker = _gat._getTracker("UA-536043-1");
pageTracker._trackPageview();
</script></body><!-- InstanceEnd --></html>
The code that I am looking for is written in this class and project.