How can I set the encoding of shell-command-on-region output?

2.4k views Asked by At

I have a small elisp script which applies Perl::Tidy on region or whole file. For reference, here's the script (borrowed from EmacsWiki):

(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start 
                       end 
                       "perltidy" 
                       t
                       t
                       (get-buffer-create "*Perltidy Output*")))

(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
    (setq start (region-beginning)
          end (region-end))
  (setq start (point-min)
        end (point-max)))
(perltidy-command start end)
(goto-char point)))

(global-set-key "\C-ct" 'perltidy-dwim)

I'm using current Emacs 23.1 for Windows (EmacsW32). The problem I'm having is that if I apply that script on a UTF-8 coded file ("U(Unix)" in the status bar) the output comes back Latin-1 coded, i.e. two or more characters for each non-ASCII source character.

Is there any way I can fix that?

EDIT: Problem seems to be solved by using (set-terminal-coding-system 'utf-8-unix) in my init.el. In anyone has other solutions, go ahead and write them!

2

There are 2 answers

0
Chris Zheng On

Below are from shell-command-on-region document

To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command.  By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.

During executing, it looks for coding system from process-coding-system-alist at first, if it's nil, then looks from default-process-coding-system.

If your want to change the encoding, you can add your converting option to process-coding-system-alist, below are the content of it.

Value: (("\\.dz\\'" no-conversion . no-conversion)
 ...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
 ...
("" undecided))

Or, if you didn't set process-coding-system-alist, it's nil, you could assign your encoding option to default-process-coding-system,

for example:

(setq default-process-coding-system '(utf-8 . utf-8))

(If input is encoded as utf-8, then output encoded as utf-8)

Or

(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))

I also wrote a post about this if you want details.

0
tripleee On

Quoting the documentation for shell-command-on-region (C-h f shell-command-on-region RET):

To specify a coding system for converting non-ASCII characters in the input and output to the shell command, use C-x RET c before this command. By default, the input (from the current buffer) is encoded in the same coding system that will be used to save the file, `buffer-file-coding-system'. If the output is going to replace the region, then it is decoded from that same coding system.

The noninteractive arguments are START, END, COMMAND, OUTPUT-BUFFER, REPLACE, ERROR-BUFFER, and DISPLAY-ERROR-BUFFER. Noninteractive callers can specify coding systems by binding `coding-system-for-read' and `coding-system-for-write'.

In other words, you'd do something like

(let ((coding-system-for-read 'utf-8-unix))
  (shell-command-on-region ...) )

This is untested, not sure what the value of coding-system-for-read (or perhaps -write instead? or as well?) should be in your case. I guess you could also utilize the OUTPUT-BUFFER argument and direct the output to a buffer whose coding system is set to what you need it to be.

Another option might be to wiggle the locale in the perltidy invocation, but again, without more information about what you are using now, and no means to experiment on a system similar to yours, I can only hint.