i am using Powershell to execute an insert statement and insert a string into a database table. The text I want to insert I get from an HTTP-Request to Confluence's REST API and containts Czech characters. Following code snipet does the job of inserting the data to the database
$DAOControllerClass | Add-member -MemberType ScriptMethod -Name Get-DataBaseConnection -Value {
[OutputType([System.Data.OracleClient.OracleConnection])]
$username = $this.username
$password = $this.password
$data_source = $this.data_source
log("Executing Get-DataBaseConnection")
$connection_string = "User Id=$username;Password=$password;Data Source=$data_source"
$con = New-Object System.Data.OracleClient.OracleConnection($connection_string)
try {
$con.Open()
} catch {
throw "Could not open database connection"
}
log("Connectiong opened")
return $con
}
$DAOControllerClass | Add-Member -MemberType ScriptMethod -Name Update-CNFLPageIntoOldWorld -Value {
param(
[Parameter(Mandatory=$true)][String[]]$values
)
log("Executing Update-CNFLPageIntoBaseLayer")
try{
$con = $this.'Get-DataBaseConnection'()
$command = $con.CreateCommand()
$command.Connection = $con
$command.CommandText = [IO.File]::ReadAllText(".\Database queries\Data dictionary - Core layer queries\Update_cnfl_page_old_world.sql")
$null = $command.Parameters.Add("cnfl_page_id", $values[0])
$null = $command.Parameters.Add("label", $values[1])
$null = $command.Parameters.Add("business_pojem_html", $values[2])
$null = $command.Parameters.Add("popis_html",$values[3]) # The issue is with $values[3]
$null = $command.ExecuteNonQuery()
log("The cnfl page with the id: " + $values[0] + " got updated in the table confluence_page_old_world")
} catch {
throw (“Database Exception: " + $con.ConnectionString + ": " + $_.Exception.ToString())
} finally{
if ($con.State -eq ‘Open’) {
$con.close()
$command.Dispose()
}
}
}
Now the text that I get passed as parameter when downloaded from a Confluence page is as follows: "Reportingové statusy a příchody/odchody klientů."
When I print this text in Powershell, everything looks fine. All letters are represented as they should be. When I debug this code and see what text is assigned to the $command.CommandText then it also looks fine.
But when I see the result in the Database it looks like following:
So all the letters are fine except that the acutes are split away from its base letter. I tried different encodings in powershell, I tried to change the NLS settings in the Database. I also tried to write to a .txt file, encode it with utf-8, with unicode and with ISO/IEC 8859-2 only to read it from the file but this also did not work.
The only thing that works is when I hardcode the text into Powershell like this:
$null = $command.Parameters.Add("popis_html","Reportingové statusy a příchody/odchody klientů.")
Then I get the expected result. So it seems to me that when passing the string as an argument some conversion or encoding is happing but I have no clue what it could be because the letters actually get represented, it is just about the acutes. I have following Encoding settings in Powershell
IsSingleByte : True
BodyName : iso-8859-2
EncodingName : Central European (Windows)
HeaderName : windows-1250
WebName : windows-1250
WindowsCodePage : 1250
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1250
And following nls_session_parameters
NLS_LANGUAGE CZECH
NLS_TERRITORY CZECH REPUBLIC
NLS_CURRENCY Kč
NLS_ISO_CURRENCY CZECH REPUBLIC
NLS_NUMERIC_CHARACTERS ,.
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD.MM.RR
NLS_DATE_LANGUAGE CZECH
NLS_SORT CZECH
NLS_TIME_FORMAT HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT DD.MM.RR HH24:MI:SSXFF
NLS_TIME_TZ_FORMAT HH24:MI:SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT DD.MM.RR HH24:MI:SSXFF TZR
NLS_DUAL_CURRENCY Kč
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
I tried setting the Powershell Encoding to Unicode with this line
$OutputEncoding = [System.Text.Encoding]::Unicode
The result in the database was the same. What else could I try? Thank you!
Perhaps the problem is the following (I cannot personally verify this):
It sounds like what you're receiving from the Confluence REST API are strings in decomposed Unicode normalization form (NFD), in which accented characters are represented by two code points: the base letter (e.g.,
e
), followed by a combining diacritic (e.g.,́
, the combining acute accent,U+0301
)And it looks like Oracle perhaps has trouble with this decomposed normal form and supports only a composed form (NFC), where accented letters have a direct representation as a single code point (e.g.,
é
, the Latin small letter with acute,U+00E9
),Therefore, you can try to convert the strings to the composed form (NFC) using the
String.Normalize()
method: