swift removingPercentEncoding not work with a gb2312 string

1.3k views Asked by At

The server returns a gb2312 string that has been processed by the urlencode function:

%D7%CF%BD%FB%B3%C7%C4%A7%D6%E4_%CE%DE%CF%DE%D0%A1%CB%B5%CD%F8_www.55x.cn.rar

How to decode it back to gb2312 string:

紫禁城魔咒_无限小说网_www.55x.cn.rar

3

There are 3 answers

1
OOPer On BEST ANSWER

Percent encoding on other encodings than UTF-8 is not considered to be a recommended way in recent www world, so you may need to implement such conversion by yourself.

It may be something like this:

extension String.Encoding {
    static let gb_18030_2000 = String.Encoding(rawValue: CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.GB_18030_2000.rawValue)))
}

extension String {
    func bytesByRemovingPercentEncoding(using encoding: String.Encoding) -> Data {
        struct My {
            static let regex = try! NSRegularExpression(pattern: "(%[0-9A-F]{2})|(.)", options: .caseInsensitive)
        }
        var bytes = Data()
        let nsSelf = self as NSString
        for match in My.regex.matches(in: self, range: NSRange(0..<self.utf16.count)) {
            if match.rangeAt(1).location != NSNotFound {
                let hexString = nsSelf.substring(with: NSMakeRange(match.rangeAt(1).location+1, 2))
                bytes.append(UInt8(hexString, radix: 16)!)
            } else {
                let singleChar = nsSelf.substring(with: match.rangeAt(2))
                bytes.append(singleChar.data(using: encoding) ?? "?".data(using: .ascii)!)
            }
        }
        return bytes
    }
    func removingPercentEncoding(using encoding: String.Encoding) -> String? {
        return String(data: bytesByRemovingPercentEncoding(using: encoding), encoding: encoding)
    }
}

let origStr = "%D7%CF%BD%FB%B3%C7%C4%A7%D6%E4_%CE%DE%CF%DE%D0%A1%CB%B5%CD%F8_www.55x.cn.rar"
print(origStr.removingPercentEncoding(using: .gb_18030_2000)) //->Optional("紫禁城魔咒_无限小说网_www.55x.cn.rar")
0
duncanc4 On

NSString does include this functionality in a deprecated function.

https://developer.apple.com/documentation/foundation/nsstring/1407783-replacingpercentescapes

0
Nick X On

OOPer's answer is great. recently I met this issue too and found this post. I came up with a function to do the reversed operation. hope it will help someone else.

    func urlencode(using encoding: String.Encoding = .gb_18030_2000) -> String? {
        var res = ""
        let allowedSet = NSMutableCharacterSet()
        allowedSet.formUnion(with:CharacterSet.urlQueryAllowed)
        // I need to filter the `&` char as well. change it for your needs.
        allowedSet.removeCharacters(in: "&")

        let allowed = allowedSet as CharacterSet
        if let data = src.data(using: encoding) {
            res = data.reduce(into:res) {
                let scalar = UnicodeScalar($1)
                if $1 <= 127, allowed.contains(scalar) {
                    $0 += String(Character(scalar))
                } else {
                    $0 += String(format:"%%%02X", $1)
                }
            }
        }
        
        return res.isEmpty ? self : res
    }