HTML Parsing and Sanitization Issue in React Quill Editor for Next.js (pages router using javascript)

158 views Asked by At

I'm working on an application that allows hockey clubs to connect with each other by providing club listing pages with individual announcement pages. The announcement pages contain a create and update form integrated with the React Quill Editor in a Next.js application.

My issue is with HTML parsing and sanitization. Even after using parsing and sanitization, most of the content copied from web pages or Microsoft Word is not being parsed or sanitized properly. Even when I clear the formatting of the pasted text there are still html tags that get posted.

The content is stored in a MongoDB database as HTML. Here is an example of a successful announcement which has no parsing issues and how it would appear in the database.

{"_id":{"$oid":"654432e404a050404c98192c"},"clubRepId":"65277560edd9c91689669d14","clubListingId":"65383f5f21db9527b5e4d8c3","announcementTitle":"New Jerseys","announcementBody":"<h3>Jerseys for the 2023 Season have arrived!</h3><p>Please come to the first practice with your payment.</p><p>Coach Greg will be at the signup table at the beginning of practice distributing jerseys.</p><p><br></p><p>The coaches are excited for the season to come!</p><p>We look forward to seeing everyone at practice.</p>","announcementDate":{"$date":{"$numberLong":"1698968292584"}},"__v":{"$numberInt":"0"}}

I'm including the relevant code/demonstrations of issues below to provide context. The main issue is that pasted content is not being sanitized correctly. I've tried to use DOMPurify and Quill Editor modules to handle this, but I'm still facing problems.

I would appreciate any help in resolving this issue. If you have suggestions on how to improve my HTML parsing and sanitization process, or if you see any issues in the code provided, please let me know. Thank you!

Here is a demonstration of what is currently happening using the below code.

What i am expecting is that when the user pastes any content from a webpage or microsoft word. There should be no html tags that appear in the posted announcement.

components/AnnouncementContentParse.js

https://codefile.io/f/Tp7mDBPwM3

components/QuillEditor.js

https://codefile.io/f/24BoYbO5i3

pages/clubListings/announcements/[id].js

https://codefile.io/f/ES6v1dRZt5

pages/edit/announcement.js

https://codefile.io/f/jHv5TvUlDe

pages/create/announcement.js

https://codefile.io/f/YbyfPFXU5q

0

There are 0 answers