I want to insert text in html using python

639 views Asked by At

I am a python developer and still learning, I want some help in scraping concept, I just want to tell you what I want. Below is the html code. containig different tags like "em" , and p and span.

Classes are different , named as obisnuit and obisnuit2.

html1="""<p class="text_obisnuit2">Best 3 developers.</p>
        <p class="text_obisnuit">There are best three types of web developers in world.</p>
        <p class="text_obisnuit2"><em>A javascript web developer.</em></p>
        <p class="text_obisnuit"><em>A nodeJS web developer.</em></p>
        <p class="text_obisnuit"><em>A python web developer <span class="text_obisnuit2">Django developer</span></em></p>
"""

I am trying to translate text between them and and inserting a translated version of text and I am successful at this thing, but the issue is coming in tags.

Here is my code below: When I tried to scrape text using this method then the em tag was removed and only text was successfully inserted there.

from bs4 import BeautifulSoup
import translators as ts
soup1=BeautifulSoup(html1, 'html.parser')

articles = soup1.find_all('p', {'class':"text_obisnuit"})
for a in articles:  

    original_text=a.text
    #print(original_text)
    translated_output=ts.google(original_text, from_language='en', to_language='ro')


    a.string = translated_output.lower()
    print(a.string)



        

After running the above method, the output was:

OUTPUT>>

<p class =" text_obisnuit2 "> Cei mai buni 3 dezvoltatori. </p>
<p class = "text_obisnuit"> Există cele mai bune trei tipuri de dezvoltatori web din lume. </p>
<p class = "text_obisnuit2"> Un dezvoltator web javascript. </p>
<p class = "text_obisnuit"> Un dezvoltator web nodeJS. </p>
<p class = "text_obisnuit"> Un dezvoltator web Python <span class = "text_obisnuit2"> Dezvoltator Django </span> </p>

If you see clearly, the em tag is removed from output, so I do not want it to be removed, I want the same html structure after translation.

I also tried this method but only the text of em tag was scraped, not the whole html text.

articles = soup1.find_all('em')
for item in articles:    
    original_text=item.text.strip()
    #print(original_text)
    translated_output=ts.google(original_text, from_language='en', to_language='ro')
    #print(item)

    item.string=translated_output
    
    

The OUTPUT I want should be:

OUTPUT>>

<p class =" text_obisnuit2 "> Cei mai buni 3 dezvoltatori. </p>
<p class = "text_obisnuit"> Există cele mai bune trei tipuri de dezvoltatori web din lume. </p>
<p class = "text_obisnuit2"><em> Un dezvoltator web javascript. </em></p>
<p class = "text_obisnuit"><em> Un dezvoltator web nodeJS. </em></p>
<p class = "text_obisnuit"><em> Un dezvoltator web Python <span class = "text_obisnuit2"> Dezvoltator Django </span></em> </p>

Anyone guide me, please.

1

There are 1 answers

10
Jack Fleeting On

The problem is that in your html, the text elements are sometimes direct children of <p> and sometimes buried two or three layers below. Try this on your original html and see if it works:

for item in articles:
    targets = item.find_all()
    if len(targets)==0:        
        item.string=ts.google(item.string, from_language='en', to_language='ro')
    else:
      #EDIT: the next line was dropped: 
      for target in targets:
        if target.string:
            target.string=ts.google(target.string, from_language='en', to_language='ro')