MSHTML PasteHTML() produces  

1.2k views Asked by At

We use in Delphi the standard TWebbrowser component, that uses mshtml.dll internally. Additionaly we use the registry to ensure that the pages renders with the new rendering engine (Web-Browser-Control-Specifying-the-IE-Version, MSDN: FEATURE_BROWSER_EMULATION). So we use the rendering of IE 10 but we have the same results with ie 8 to ie 11.

Using the standard rendering machine of MSHTML (IE7) works right, but due to new rendering options we need the new rendering of MSHTML.

We use the design mode of the control to enabled the user to make changes in the documents:

var
  mDocument: IHTMLDocument2;
begin
  mDocument := ((ASender as TWebBrowser).Document as IHTMLDocument2);
  mDocument.designMode := 'on';

Now we have the following problem: When we use the IHTMLTxtRange.pasteHTML(...) to insert HTML code, some of the spaces are replaced by  

procedure TForm1.BT_PasteHtmlClick(Sender: TObject);
var
  mDoc2: IHTMLDocument2;
  mOvSel:IHTMLSelectionObject;
  mRange: IHTMLTxtRange;
  mHtml: string;
begin
  /// Reproduzierbarer Fehler bei PasteHtml
  ///  Leere Zellen und falsche Umbrüche.
  mDoc2 := WB_Test.Document as IHTMLDocument2;

  mOvSel := mDoc2.selection as IHTMLSelectionObject;
  mRange := mOvSel.CreateRange() as IHTMLTxtRange;

  mHtml := '<TABLE width="100%" border="1" cellspacing="0" cellpadding="0">  <TBODY>  <TR>    <TD>Falsche Zellen werden erstellt, wo nur diese eine sein sollte!</TD></TR></TBODY></TABLE>' + sLineBreak +
           '<p>Falsche Umbrueche '  + sLineBreak + 
           'wo keine sein sollten  durch CRLF im Html-Code!</p>' + sLineBreak;
  mRange.pasteHTML(mHtml);
end;

Looking at the inserted Code, the spaces between the TABLE, TBODY, TR and TD tags have been converted to &nbsp;. The wrongly inserted HTML code is:

<TABLE width="100%" border="1" cellspacing="0" cellpadding="0">&nbsp; 
  <TBODY>&nbsp; 
  <TR>&nbsp;&nbsp;&nbsp; 
    <TD>Falsche Zellen werden erstellt, wo nur diese eine sein 
  sollte!</TD></TR></TBODY></TABLE><BR>
<P>Falsche Umbrueche <BR>wo keine sein sollten&nbsp; durch CRLF im 
Html-Code!</P>

EDIT: We start with following HTML:

<html>
  <body>
  </body>
</html>

and get after inserting:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv="Content-Type" content="text/html; charset=windows-1252">
<META name="GENERATOR" content="MSHTML 10.00.9200.16540"></HEAD>
<BODY> 
<TABLE border="1" cellspacing="0" cellpadding="0">
  <TBODY>
  <TR>
    <TD>Tabelle mit<BR>einem Text!</TD></TR></TBODY></TABLE><BR>
<P>Falsche Umbrüche durch zu viele&nbsp; Leerzeichen</P></BODY></HTML>
1

There are 1 answers

1
Stijn Sanders On BEST ANSWER

This may be by design. Conform to HTML specifications, any whitespace in the HTML code should be treated as a single instance of whitespace (except inside <pre> tags). To provide extra word separation when you type two or more spaces in design mode, IE inserts &nbsp; HTML entities instead.