How to create PDF containing Persian(Farsi) text with reportlab, rtl and bidi in python

5.1k views Asked by At

I've been trying to create a PDF file from content that can be English, Persian, digits or a combination of them.

there is some problems with Persian texts like: "این یک متن فارسی است"

۱- the text must be written from right to left

2- there is a difference between characters in different positions in the word (meaning that characters change their shape according to their surrounding characters)

3- because the sentence is read from right to left then the normal textwrap doesn't work correctly.

4

There are 4 answers

5
r.aj On

I used reportlab for creating PDf but unfortunately reportlab doesn't support Arabic and Persian alphabet so I used 'rtl' library by Vahid Mardani and 'pybidi' library by Meir Kriheli to make the text look right in PDF result.

first we need to add a font that supports Persian to reportlab:

  1. in ubuntu 14.04:

    copy Bahij-Nazanin-Regular.ttf into
    /usr/local/lib/python3.4/dist-packages/reportlab/fonts folder
    
  2. add font and styles to reportlab:

    from reportlab.lib.enums import TA_RIGHT
    from reportlab.pdfbase import pdfmetrics
    from reportlab.pdfbase.ttfonts import TTFont
    pdfmetrics.registerFont(TTFont('Persian', 'Bahij-Nazanin-Regular.ttf'))
    styles = getSampleStyleSheet()
    styles.add(ParagraphStyle(name='Right', alignment=TA_RIGHT, fontName='Persian', fontSize=10))
    

in next step we need to reshape Persian text Letters to the right shape and make the direction of each word from right to left:

    from bidi.algorithm import get_display
    from rtl import reshaper
    import textwrap

    def get_farsi_text(text):
        if reshaper.has_arabic_letters(text):
          words = text.split()
          reshaped_words = []
          for word in words:
            if reshaper.has_arabic_letters(word):
              # for reshaping and concating words
              reshaped_text = reshaper.reshape(word)
              # for right to left    
              bidi_text = get_display(reshaped_text)
              reshaped_words.append(bidi_text)
            else:
              reshaped_words.append(word)
          reshaped_words.reverse()
         return ' '.join(reshaped_words)
        return text

and for adding bullet or wrapping the text we could use following function:

    def get_farsi_bulleted_text(text, wrap_length=None):
       farsi_text = get_farsi_text(text)
       if wrap_length:
           line_list = textwrap.wrap(farsi_text, wrap_length)
           line_list.reverse()
           line_list[0] = '{} •'.format(line_list[0])
           farsi_text = '<br/>'.join(line_list)
           return '<font>%s</font>' % farsi_text
       return '<font>%s &#x02022;</font>' % farsi_text

for testing the code we can write:

    from reportlab.lib.pagesizes import letter
    from reportlab.platypus import SimpleDocTemplate, Paragraph
    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

    doc = SimpleDocTemplate("farsi_wrap.pdf", pagesize=letter,    rightMargin=72, leftMargin=72, topMargin=72,
                    bottomMargin=18)
    Story = []

    text = 'شاید هنوز اندروید نوقا برای تمام گوشی‌های اندرویدی عرضه نشده باشد، ولی اگر صاحب یکی از گوشی‌های نکسوس یا پیک' \
   'سل باشید احتمالا تا الان زمان نسبتا زیادی را با آخرین نسخه‌ی اندروید سپری کرده‌اید. اگر در کار با اندروید نوقا' \
   ' دچار مشکل شده‌اید، با دیجی‌کالا مگ همراه باشید تا با هم برخی از رایج‌ترین مشکلات گزارش شده و راه حل آن‌ها را' \
   ' بررسی کنیم. البته از بسیاری از این روش‌ها در سایر نسخه‌های اندروید هم می‌توانید استفاده کنید. اندروید برخلاف iOS ' \
   'روی گستره‌ی وسیعی از گوشی‌ها با پوسته‌ها و اپلیکیشن‌های اضافی متنوع نصب می‌شود. بنابراین تجویز یک نسخه‌ی مشترک برا' \
   'ی حل مشکلات آن کار چندان ساده‌ای نیست. با این حال برخی روش‌های عمومی وجود دارد که بهتر است پیش از هر چیز آن‌ها را' \
   ' بیازمایید.'
    tw = get_farsi_bulleted_text(text, wrap_length=120)
    p = Paragraph(tw, styles['Right'])
    Story.append(p)
    doc.build(Story)
0
S_M On

In case anyone wants to generate pdfs from html templates using Django, this is how it can be done:

template = get_template("app_name/template.html")
context = Context({'something':some_variable})
html = template.render(context)
pdf = pdfkit.from_string(html, False)
response = HttpResponse(pdf, content_type='application/pdf')
response['Content-Disposition'] = 'attachment; filename=output.pdf'
return response
0
Hassan Fadaie Ghotbie On

send multibyte (farsi , arabic) string as parameter to below typescript function, and put returned string to pdfMaker or any other PDF generator

farsiNew(farsistr){ 
         // because pdfmake display it mirrored by default
        var allText = '';
        var point = 19;
        var words = farsistr.split("\n");
        var newword;
        for(var i=0; i<=words.length-1; i++){
           newword = words[i].split( ' ');
            if (newword.length <point) {
                allText = allText + newword.reverse().join(' ') + "\n";
            }else{
 
                for(var q =0; q<= Math.ceil (newword.length / point); q++) {

                    var s , t;
                    if (q === 0) {
                        s = 0; t = point;
                    }
                    else {
                         s = q * point + q;
                        t = s + point;
                    }
                    for (var v = t; v >= s; v--) {
                        if(!newword[v])
                            continue;
                        allText = allText + ' ' +newword[v]

                    }
                   allText = allText + '\n';
                }
            }
        }
        return allText;

}
1
r.aj On

After working for a while with Reportlab, we had some problems with organizing and formatting it. It took a lot of time and was kind of complicated. So we decided to work with pdfkit and jinja2. This way we can format and organize in html and CSS and we don't need to reformat Persian text too.

first we can design an html template file like the one below:

    <!DOCTYPE html>
        <html>
        <head lang="fa-IR">
            <meta charset="UTF-8">
            <title></title>
        </head>
        <body >
            <p dir="rtl">سوابق کاری</p>
            <ul dir="rtl">
                {% for experience in experiences %}
                <li><a href="{{ experience.url }}">{{ experience.title }}</a></li>
                {% endfor %}
            </ul>
        </body>
        </html>

and then we use jinja2 library to render our data into Template, and then use pdfkit to create a pdf from render result:

    from jinja2 import Template
    from pdfkit import pdfkit

    sample_data = [{'url': 'http://www.google.com/', 'title': 'گوگل'},
                   {'url': 'http://www.yahoo.com/fa/', 'title': 'یاهو'},
                   {'url': 'http://www.amazon.com/', 'title': 'آمازون'}]

    with open('template.html', 'r') as template_file:
        template_str = template_file.read()
        template = Template(template_str)
        resume_str = template.render({'experiences': sample_data})

        options = {'encoding': "UTF-8", 'quiet': ''}
        bytes_array = pdfkit.PDFKit(resume_str, 'string', options=options).to_pdf()
        with open('result.pdf', 'wb') as output:
            output.write(bytes_array)