Python Telegram Bot and Pandas - reply_text show "\n" Instead of new line

66 views Asked by At

I have a simple Telegram Bot with python-telegram-bot package and i use pandas to import a csv:

df = pd.read_csv('data.csv')

key text
00 text1 \n text2 \n text3
. some \n text
. other \n text
99 text4 \n text5 \n text6

Then I do a search inside the dataframe from user message:

answer = df['text'].str.contains(update.message.text, case=False)]

and the bot sends a reply message to the user:

await update.message.reply_text(answer)

but the output is showing the "\n" tag:


text1 \n text2 \n text3

and I want to show the text:

text1
text2
text3

I'm struggling with this problem. Before dataframe I used TinyDb and everything worked fine. How can I resolve?

Thanks

I try to change dtype of column to string, to export csv to list, encoding of the file.

1

There are 1 answers

4
duckzio On BEST ANSWER

I tried what happened in your case, but it worked for me.

Image result in telegram

import pandas as pd

# I am using json format for my case
data = [
    {"key": "01", "text": "text1 \n text2 \n text3"},
    {"key": "01", "text": "some \n text"},
    {"key": "02", "text": "other \n text"},
    {"key": "99", "text": "text4 \n text5 \n text6"},
]

df = pd.DataFrame(data)

# ...

answer = df[df["text"].str.contains("some", case=False)]
# answer = df[df["text"].str.contains(update.message.text, case=False)]

if not answer.empty:
    print(answer.values[0][1].encode())  # check raw text
    # out: b'some \n text'
    print(type(answer.values[0][1]))  # check type
    # out: <class 'str'>
    await update.message.reply_text(answer.values[0][1])

With csv data:

key,text
01,text1 \n text2 \n text3
01,some \n text
02,other \n text
99,text4 \n text5 \n text6
101,this is emoji \n ✅ \n \U0001F600\
# ...
df = pd.read_csv("data.csv")

answer = df[df["text"].str.contains("some", case=False)]

if not answer.empty:
    print(answer.values[0][1].encode())  # check raw text
    # out: b'some \n text'
    print(type(answer.values[0][1]))  # check type
    # out: <class 'str'>
    print(answer.values[0][1])  # print text
    # out: some \n text
    print(answer.values[0][1].replace("\\n", "\n"))  # replace text
    # out: some 
    #       text

    # ...

Output:

b'some \\n text'
<class 'str'>
some \n text
some 
 text

When we work with csv, we will get \\n, we can change \\n to \n to get the result we need.

print(answer.values[0][1].replace("\\n", "\n"))  # replace text

Working with unicode, we need a unicode escape sequence and replace it with Unicode characters using the unicode_escape codec.

A simple way we can use a regex expression like that:

import re

def replace_unicode_escape(text):
    def replace(match):
        return match.group(0).encode().decode("unicode_escape")

    text = re.sub(r"\\n", "\n", text)
    return re.sub(r"\\U[0-9a-fA-F]{8}", replace, text)
# ...

if not answer.empty:
    # ...

    print(replace_unicode_escape(answer.values[0][1]))  # replace text
    # out: this is emoji
    #       ✅
    #       

    # ...

Output:

b'this is emoji \\n \xe2\x9c\x85 \\n \\U0001F600'
<class 'str'>
this is emoji \n ✅ \n \U0001F600
this is emoji 
 ✅