I need help creating database tables from html tables. Right now I'm just manually doing it. I have a html doc that has all the data but I don't know what the best way to extract the data.
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ATOMS Definition for Type tom.service.soc.SocRecord</title>
<style type="text/css">
body
{
line-height: 1.6em;
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 14px;
margin: 45px;
}
#box-table-a
{
font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif;
font-size: 12px;
margin: 5%;
width: 90%;
text-align: left;
border-collapse: collapse;
}
#box-table-a th
{
font-size: 13px;
font-weight: normal;
padding: 8px;
background: #b9c9fe;
border-top: 4px solid #aabcfe;
border-bottom: 1px solid #fff;
color: #039;
}
#box-table-a td
{
padding: 8px;
background: #e8edff;
border-bottom: 1px solid #fff;
color: #669;
border-top: 1px solid transparent;
}
#box-table-a tr:hover td
{
background: #d0dafd;
color: #339;
}
</style>
</head>
<body>
<table id="box-table-a" summary="Definition for tom.service.soc.SocRecord">
<thead>
<tr><th colspan="2">tom.service.soc.SocRecord</th></tr>
</thead>
<tbody>
<tr>
<td>Version</td>
<td>1</td>
</tr>
<tr>
<td>Description</td>
<td>[type is UNCLASSIFIED] Temporary dummy test object for SOC</td>
</tr>
</tbody>
</table>
<table id="box-table-a" summary="Fields Definition for Type tom.service.soc.SocRecord">
<thead>
<tr>
<th scope="col">Index</th>
<th scope="col">Name</th>
<th scope="col">Type</th>
<th scope="col">Range</th>
<th scope="col">Default</th>
<th scope="col" width="50%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>socID</td>
<td>String</td>
<td>
-
</td>
<td>""</td>
<td>
[ ] The UUID of the tracked object -- String for transmission purposes
</td>
</tr>
<tr>
<td>2</td>
<td>satID</td>
<td><a href="../../../../../tom/state/vcm/SatNumberType.html">SatNumberType</a></td>
<td>
</td>
<td></td>
<td>
[ ] The ID of the tracked object -- copy of the satelliteId in the VCM
</td>
</tr>
</tbody>
</table>
</body></html>
Here is the html. I would like help making a create postgres database table script like this from the html. And if it has a href then that links to another table.
CREATE TABLE soc.SocRecord(
socId TEXT, --[ ] The UUID of the tracked object -- String for transmission purposes
satId UUID, --[ ] The ID of the tracked object -- copy of the satId in the VCM
commonName TEXT, --[ ] The name of the tracked object -- may be blank -
--This field is optional in the current version of the message, check the set attribute before use.);
Edit Tried out some more variants and found that using
zipresulted in no errors and returned the sql script.I've updated the code bellow as well.
So your approach could constitute something like:
read
.htmlfile parse fortableparse fortheaderand so onI find idea pretty interesting, so I tried it out in python.
here you could write another script to read all
.htmlfiles in a folder and execute against that instead.The result of the above is probably not going to be exactly what you want, but it gets you started.