I am trying to write a Java app that will run on a linux server but that will process files generated on legacy Windows machines using cp-1252 as the character set. Is there anyway to encode these files as utf-8 instead of the cp-1252 it is generated as?
Encoding cp-1252 as utf-8?
38.7k views Asked by IAmYourFaja AtThere are 2 answers
Joni
On
If the file names as well as content is a problem, the easiest way to solve the problem is setting the locale on the Linux machine to something based on ISO-8859-1 rather than UTF-8. You can use locale -a to list available locales. For example if you have en_US.iso88591 you could use:
export LANG=en_US.iso88591
This way Java will use ISO-8859-1 for file names, which is probably good enough. To run the Java program you still have to set the file.encoding system property:
java -Dfile.encoding=cp1252 -cp foo.jar:bar.jar blablabla
If no ISO-8859-1 locale is available you can generate one with localedef. Installing it requires root access though. In fact, you could generate a locale that uses CP-1252, if it is available on your system. For example:
sudo localedef -f CP1252 -i en_US en_US.cp1252
export LANG=en_US.cp1252
This way Java should use CP1252 by default for all I/O, including file names.
Expanded further here: http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/
Related Questions in JAVA
- I need the BIRT.war that is compatible with Java 17 and Tomcat 10
- Creating global Class holder
- No method found for class java.lang.String in Kafka
- Issue edit a jtable with a pictures
- getting error when trying to launch kotlin jar file that use supabase "java.lang.NoClassDefFoundError"
- Does the && (logical AND) operator have a higher precedence than || (logical OR) operator in Java?
- Mixed color rendering in a JTable
- HTTPS configuration in Spring Boot, server returning timeout
- How to use Layout to create textfields which dont increase in size?
- Function for making the code wait in javafx
- How to create beans of the same class for multiple template parameters in Spring
- How could you print a specific String from an array with the values of an array from a double array on the same line, using iteration to print all?
- org.telegram.telegrambots.meta.exceptions.TelegramApiException: Bot token and username can't be empty
- Accessing Secret Variables in Classic Pipelines through Java app in Azure DevOps
- Postgres && statement Error in Mybatis Mapper?
Related Questions in LINUX
- Is there some way to use printf to print a horizontal list of decrementing hex digits in NASM assembly on Linux
- Why does Hugo generate different taxonomy-related HTML on different OS's?
- Writes in io_uring do not advance the file offset
- Why `set -o pipefail` gives different output even though the pipe is not failing
- what really controls the permissions: UID or eUID?
- Compiling eBPF program in Docker fails due to missing '__u64' type
- Docker container unable to make HTTPS requests to external API
- Whow to use callback_query_handler in Python 3.10
- Create kea runtime directory at startup in Yocto image
- Problem on CPU scheduling algorithms in OS
- How to copy files into the singularity sandbox?
- Android kernel error: undefined reference to `get_hw_version_platform'
- Is there a need for BPF Linux namespace?
- Error when trying to execute a binary compiled in a Kali Linux machine on an Ubuntu system
- Issue with launching application after updating ElectronJs to version 28.0.0 on Windows and Linux
Related Questions in UTF-8
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- UTF-8 issue with excel
- UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits
- How to convert from Java ASCII properties to UTF8 (Java 9) properties
- How to read a file that contains both ANSI and UTF-8 encoded characters
- BSONError in MongoDB Compass
- Create HMAC SHA-1 in JS with byte array
- pdftk unicode works in preview but not adobe acrobat
- xml file from ISO-8859-2 to UTF-8 in python
- How to store metadata for a UTF-8 text file cross-platform?
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- How to get character position in a text file encode in UTF-8 in C?
- Unicode character ſ is matched as itself and as 's.'
- VS Code integrated terminal UTF-8 input problem
- pdftk generated pdf does not render correct utf-8
Related Questions in CHARACTER-ENCODING
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- Cpanel filter encoding utf-8?
- bagaimana cara menginstall steghide lewat mac
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- Matching multi-language (latin extended) characters in lua
- Handle mixed charsets in the same json file
- MIPS Aiken to Binary
- I am not sure why I need to Encode path parameter TWICE to make the rest call with special chars to work?
- having character encoding problem on my blog content in php application
- Visual C++ - how can I turn a unicode character into char or string?
- Cypresss Unable to Load UTF-16 Website on Brower Launch
- How to set encoding?
- HL7 encoding characters in non-ASCII strings
- How to fix these two warnings about implicit string cast during charset conversion?
- Python PyODBC and SQL Server encoding issue
Related Questions in CP1252
- Calculate difference in encoding WITHOUT actually writing to a file?
- write csv with encoding cp1252 with python 3.12
- Convert Character Between UTF-8 to ANSI (windows 1252) on VSCode?
- Converting Special Characters in UTF-8 file to cp1252 file in Python
- Show æ, ø, å on a docusaurus page (.md and .mdx files)
- Why might my BufferedWriter output a UTF-8 text file when I'm specifying that the encoding is Cp1252?
- How to convert "\x81" to "ü" with codepage 1252 (windows 1252)
- Preserving special characters when writing to a CSV - What encoding to use?
- Find code page by code point and equivalent character
- Convert non UTF-8 ASCII literals in otherwise UTF-8 text to their respective character
- Get list of files in directory return encode error for persian files name
- Convert character Code Set from 1252 to 1256
- how to Insert utf-8 encoded csv into latin-1 database
- problem string special charset hextobin php
- Python utf-8 conversion to cp1252
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
You can read and write text data in any encoding that you wish. Here's a quick code example:
If this still 'chokes' on read, see if you can verify that the the original encoding is what you think it is. In this case I've specified windows-1252, which is the java string for cp-1252.