How do I translate a string from one language to another in PHP without making any external network requests?

402 views Asked by At

I have this:

$English_string = 'Hello. I am a robot.';

Now I want this:

$Swedish_string = 'Hej. Jag är en robot.';

I imagine the code to be like this:

$Swedish_string = translate_me($English_string, 'en', 'sv'); // text, from, to

This translate_me function must not make any kind of network request; it has to be local on my machine.

Many years ago, I tried doing various things with dictionaries and spelling correction and whatnot in PHP, but it was a big mess, and I never quite got it right. Nowadays, all the information online is so outdated or all talking about using commercial, third-party, external APIs.

I don't expect it to change the grammar or anything. Just translate word-by-word if the word exists in some dictionary database. It is often very useful to simply be able to tell "roughly" what a foreign-language text is saying, even if it won't make perfect sense and certainly not be "professional translator"-grade.

Ideally, this function would also allow me to forgo the "from" parameter (null) to make it attempt to auto-detect/guess which language it's from, in those cases where I'm not sure in advance.

I assume that no such function exists readily available in PHP. I suppose it would be reasonable for me to do the logic of looping through each word and replacing it if it's found in the dictionary, but this is perhaps the main problem:

What dictionary? The whole mess with ASPELL/PSPELL/etc. is very confusing to say the least, and I've spent a lot of time trying to find free dictionaries online to compile my own separate database of words, but this is an extreme amount of work and it seems like this would already be a "solved problem" and that I'm doing unnecessary work.

I actually feel slightly embarrassed for asking about translating texts in this "primitive" manner in the year 2020. It's not something you'd expect would be a hurdle at this point. Yet there is no obvious solution built in or easily available from what I can tell. They all want you to send all your data out from your machine, which is simply impossible as the API limits would very quickly be exhausted, even if I wanted to violate my users' privacy like that.

What would you recommend that I do? Is there a useful built-in PHP mechanism/dictionary format for this which is high-quality and updated? Or must I really spend years compiling my own gigantic database table from free dictionary database dumps that I hunt down myself?

1

There are 1 answers

0
terales On

What you are asking for is called a "direct machine translation" and this process was abandoned by localization community to get a better translation quality.

You can get a quick overview of the machine translation topic: https://vas3k.com/blog/machine_translation/?hn=1

Most prepared databases and trained models are closed because this is a core business of many companies focused on selling Machine Translation.

You can explore Apache Joshua with their already prepared language packs if you're okay with gigabytes of data needed just for translation purposes.

Also, you can check whether PROMT Master NMT 21 (offline paid pretrained translation engine with UI) has an API for local usage.