Replacing " with " using common methods does not work in a JavaScript Azure Function

345 views Asked by At

Goal

Transform HTML extracted from Telligent (an extranet platform) to plain text and send to Slack

Setup

A Telligent webhook is triggered when an event occurs. An Azure Logic App receives the event JSON. The JSON value is in HTML. A JavaScript Azure Function inside the Azure Logic App pipeline transforms the JSON value to plain text. The final step in the pipeline posts the plain text in Slack.

Example of incoming code to the Azure Function

"body": "<p>&quot; &#39;</p><div style=\"clear:both;\"></div>"

Transformation method

This is the basic code in the Azure Function. I have left out parts that seem irrelevant to this question but can provide the entire script if that is necessary.

module.exports = function (context, data) {
   var html = data.body;

// Change HTML to plain text
   var text = JSON.stringify(html.body);
   var noHtml = text.replace(/<(?:.|\n)*?>/gm, '');
   var noHtmlEncodeSingleQuote = noHtml.replace(/&#39;/g, "'");
   var noHtmlEncodeDoubleQuote = noHtmlEncodeSingleQuote.replace(/&quot;/g, "REPLACEMENT");

// Compile body for Slack
   var readyString = "Slack text: " + noHtmlEncodeDoubleQuote;

// Response of the function to be used later
   context.res = {
     body: readyString
   };

   context.done();
};

Results

The single quote is replaced successfully and resolves accurately when posted in Slack.

The following replacement methods for the double quote throw a Status: 500 Internal Server Error within the Azure Function.

Unsuccessful replacement methods

"\""
'"'
&quot;
"'"'"
"["]"
"(")"

Putting these replacement methods in their own var also throws the same error. E.g.:

var replace = "\""
...
var noHtmlEncodeDoubleQuote = noHtmlEncodeSingleQuote.replace(/&quot;/g, replace);

The code appears to be correct because when I replace &quot; with something like abc, the replacement is successful.

Thank you

Please forgive my JavaScript as I am not a programmer and am seeking to streamline a process for my job. However I am grateful for any advice both about the code or my entire approach.

1

There are 1 answers

0
Matt Johnson-Pint On BEST ANSWER

Generally, you don't want to try to parse HTML with regular expressions or string replacement. There are just too many things that can go wrong. See this now famous StackOverflow answer. (It was even made into a T-Shirt.)

Instead, you should use a technique that is purposefully built for this purpose. If you were in a web browser, you could use the techniques described in the answers to this question. But in Azure Functions, your JavaScript doesn't run in a browser, it runs in a Node JS environment. Therefore, you need will need to use a library such as Cheerio or htmlparser2 (and others).

Here is an example using Cheerio:

var cheerio = require('cheerio');
var text = cheerio.load(html.body).text();

Also, regarding this part:

... as I am not a programmer ...

Yes you are. You are clearly programming right now. Anyone who writes code is a programmer. There is no club or secret handshake. We all start out exactly like this. Good job asking questions, and good luck on your journey!