(' findstr /b "URL=" "%~1" ') not working with ö,ä,ü in path or filename

1.8k views Asked by At

I want to search "URL=" in a file. As I am quite a noob in such things I collected some code snippets from stackoverflow, ... ;-)

http://www.dostips.com/forum/viewtopic.php?f=3&t=2836&start=30

Get list of passed arguments in Windows batch script (.bat)

How to receive even the strangest command line parameters?

my prob.: If the file or path contains the german "ö/ä/ü" or letters/signs from foreign languages in path or filename

D:\...\fähren

is treated like this

D:\...\f"hren

and findstr says "cant open file". Here is a part of my .bat

rem %cmdcmdline%
...
:file   rem url from .url file  - im file steht URL=http.... .htm
for /f "delims=" %%a in ('findstr /b "URL=" "%~1"') do set URL="%%a"
echo. %URL% | FIND /I "URL=">Nul || (set URL=""&goto startit)

rem delete all until URL
set URL="%URL:*URL=%
rem delete =
set URL="%URL:~2%

the .bat is called from within windows like this

HKEY_CLASSES_ROOT\InternetShortcut\shell\chrome\command "D:\sources\chrome\portable\chrome\chrome.exe" "%1"

rem %cmdcmdline% at the beginning of .bat looks OK

D:\4all\reisen\istanbul\verkehr\fähren>rem C:\Windows\system32\cmd.exe /c ""C:\Users\gigoelri\AppData\Local\Temp\333A.tmp\chrome_pause.bat" D:\sources\chrome\portable\chrome\chrome.exe D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url "

the output of the for loop looks like this:

D:\4all\reisen\istanbul\verkehr\fähren>for /F "delims=" %a in ('findstr /b "URL=" "D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url"') do set URL="%a"
FINDSTR: D:\4all\reisen\istanbul\verkehr\f"hren\Bosp_emin"n?_2h_14h30_12tl_SehirHatlari.url kann nicht geöffnet werden.(cannot be opened)

D:\4all\reisen\istanbul\verkehr\fähren>echo.    | FIND /I "URL="  1>Nul  || (set URL=""  & goto startit )

Codepage of my cmd window:

D:\sources\firefox\_install>chcp
Aktive Codepage: 850.

Mofi 4: Trying mofis methode 4 - result: file cant be found

rem C:\Windows\system32\cmd.exe /c ""C:\Users\gigoelri\AppData\Local\Temp\F54D.tmp\firefox_pause.bat" D:\sources\firefox\portable\firefox\firefox.exe D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url "
...
D:\4all\reisen\istanbul\verkehr\fähren>for /F "usebackq tokens=1* delims==" %a in ("D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url") do (if /I "%a" == "URL" (
set "URL=%b"
 goto startit
) )
Die Datei "D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url" 
kann nicht gefunden werden.

The reason this time seems to be that the filename contains turkish letters like "Ş" instead of "S"

Edit 20150629:

System is Windows7 and Drive D: is NTFS

%~s1 doesnt work either:

D:\4all\reisen\istanbul\verkehr\fähren>for /F "usebackq tokens=1* delims==" %a in ("D:\4all\reisen\istanbul\verkehr\FHREN~1\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url") do (if /I "%a" == "URL" (
set "URL=%b"
 goto startit
) )
Die Datei "D:\4all\reisen\istanbul\verkehr\FHREN~1\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url" kann nicht gefunden werden.

Explorer filename is: Bosp_eminönü_2h_14h30_12tl_ŞehirHatları.url The URL file was created by drag&drop the following URL from chrome URL: http://en.sehirhatlari.com.tr/en/timetable/short-bosphorus-tour-363.html

%windir%\system32\cmd.exe dir command shows: neither the Ş nor the ı at the end are displayed OK.

And the .exe seems to be called already with the wrong name:

Edit 20150630a:

I covert the .bat using Bat_To_Exe_Converter_(x64).exe. I do this e.g. because this way registry entries can be unchanged and .exe can be pinned without extra effort.

And you are right, if windows calls .bat everything OK.

HKEY_CLASSES_ROOT\IE.AssocFile.URL\Shell\firefox\command
"D:\sources\firefox\_install\firefox.bat" "%1"

Can it be that windows passes parameters differently depending on wether it calls .bat or .exe ?

!!!! @Mofi: Thank your for your extended 1a support !!!!

It does not seem a prob of the "bat to exe converter" because: Please have a look at the first line rem statement. It differs quite a bit from the .exe screenshot postet under Edit 20150629. There is an additional statement "C:\Users\gigoelri\AppData\Local\Temp\F411.tmp\firefox_pause.bat", "" are set different and the URL is spelled different at the end ...ı.url"" instead of ...i.url "

1

There are 1 answers

2
Mofi On

1. About quoting values assigned to variables

A very common mistake made is using:

set variable="value with spaces"

This assigns "value with spaces" and everything else up to end of line like trailing spaces to variable.

The correct positioning of first double quote is:

set "variable=value with spaces"

This assigns just value with spaces to variable independent on trailing spaces or tabs on this line.

For more details see my answer on Why is no string output with 'echo %var%' after using 'set var = text' on command line?

2. Testing for assignment done in FOR loop

for /f "delims=" %%a in ('findstr /b "URL=" "%~1"') do set URL="%%a"
echo. %URL% | FIND /I "URL=">Nul || (set URL=""&goto chrome)

This is a much more complex method to test on assignment made in FOR loop than really necessary.

Much easier to read and faster on execution would be:

@echo off
set "URL="
for /F "delims=" %%a in ('%SystemRoot%\System32\findstr.exe /b "URL=" "%~1" 2^>nul') do set "URL=%%a"

if "%URL%"=="" goto Chrome

rem Remove URL= from string value.
set "URL=%URL:~4%"

echo URL found: %URL%
goto :EOF

:Chrome
echo No URL found.

Removing URL= case-insensitive is now much easier as the double quotes are not part of string value assigned to variable URL because of quoting value assignment to variable right.

3. Code page on GUI and in console windows

In German countries the code page used on GUI for non Unicode strings is Windows-1252.

But in console windows by default OEM code page 850 is used in German countries.

It can be seen on comparing the two tables that the German umlauts have different byte values in those two code pages which explains what you see.

The code page used by default in console windows can be seen by opening a command prompt window and run there either command chcp without any parameter or command mode without any parameter. In both cases the used code page is output in console window.

Command chcp means change code page and can be therefore used to switch the code page for active command prompt.

What you have to do in the batch file depends on which encoding is used for file name string passed as parameter to the batch file.


Edit after questioner provided additional information about how batch file is called.

4. Method without using FINDSTR

findstr is not needed for this task. Usage of findstr just makes the batch file slower and more complex than necessary.

Therefore I suggest a much easier batch solution for this task:

@echo off
for /F "usebackq tokens=1* delims==" %%a in ("%~1") do (
    if /I "%%a"=="URL" (
        set "URL=%%b"
        goto FoundURL
    )
)
echo No URL found.
goto :EOF

:FoundURL
echo URL found: %URL%

The *.url file is parsed now directly by command line interpreter with for instead of using findstr.

Run in a command prompt window for /? for help on this command.

A string in double quotes is parsed by default directly when using for with parameter /F. But for this task a file with full path specified in double quotes must be parsed. Therefore usebackq is used to change for behavior on string parsing to get file name with path in double quotes interpreted as name of a file to parse.

Next this batch file is only interested in the line:

URL=https://stackoverflow.com/

So delims== is used to split up each line into strings with using equal sign as delimiter.

Wanted is the string left to first equal sign and everything right of first equal sign which of course could contain also 1 or more equal signs. We get exactly that split behavior with tokens=1*. The string left to first equal sign is token 1 which is assigned to loop variable a while everything else after first equal sign is token 2 which is assigned to loop variable b.

With a case-insensitive comparison of string left to equal sign with string URL a check is made if line of interest is found in the file. In this case token 2 being the URL string is assigned to environment variable URL and the loop is exited with a jump to a label as there is no need to further parse the remaining lines of the file.

In case of for loop finishes normally, there is no line in *.url file starting with URL= in any case. Then the result is an appropriate information message before batch file is exited with goto :EOF (EOF - end of file - a nowadays always existing because predefined label).

Otherwise the found URL is output before also exiting this demo batch file.

This batch file called within a command prompt window with

D:\4all\reisen\istanbul\verkehr\fähren\Bosp_eminönü_2h_14h30_12tl_SehirHatlari.url

or from Windows Explorer has no problem to open the file with the German umlauts and parse it.


Questioner asked:

Can it be that windows passes parameters differently depending on whether it calls .bat or .exe?

For file and directory names this is true.

"%1" in a file association is a placeholder for an argument, usually the name of a file or directory.

There are now 3 possibilities for Windows to pass a directory or file name to an application:

  1. In short format using 8.3 format for all directories in path and file name itself. 8.3 means only up to 8 characters for directory / file name and only up to 3 characters for the file extension with a very limited set of characters. This format is used by Windows if the application (.com or .exe) to start is a 16-bit application according to header of the application to start.

  2. In long format using ANSI characters only, i.e. 1 byte per character with a null byte at end as termination. This format is used by Windows if the application is a 32-bit or 64-bit application according to header with no support for Unicode. Directory and file names with a Unicode character in string are converted to system locale code page for non Unicode aware applications. The system locale for non Unicode aware applications can be set by the user in the Windows Region and Language settings.

  3. In long format using Unicode characters, i.e. with 2 bytes per character if the application to start is Unicode aware according to its header.

ANSI strings use an array of type char in C/C++ coded applications for Windows while an array of type wchar_t is used for Unicode strings. Details for C/C++ programmers for Windows can be found

"%L" can be used instead of "%1" for a file association in HKEY_CLASSES_ROOT in Windows registry if Windows should pass a file or directory name always in long format and never in short format to an application. This is sometimes needed if an application is a hybrid like a C/C++ console application compiled with DJGPP which is a 16-bit application, but supports nevertheless long ANSI encoded file names because of special startup code.

But back to the question: Yes, of course, Windows passes file and directory names differently to a batch file or an executable depending on header of the executable, i.e. which type of application it is and which type of strings it supports.

It looks like the used bat to exe converter creates a 64-bit console application which is Unicode aware. So this application must convert correct the Unicode string to an ANSI string using the system locale of the user account on passing file and directory names and other arguments to the command finally running the embedded batch file. And it looks like this converter makes this Unicode to ANSI conversion task or the creation of the command line to run the batch file not 100% correct.