肇鑫的技术博客

业精于勤,荒于嬉

Best Practice of URL Related Operations with Strings

What is a URL?

A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

https://en.wikipedia.org/wiki/URL

URL shown in address bar of a browser

In a browser like Safari or Chrome, a URL is shown in a human friendly way. For example, you may see a URL like this for https://zh.wikipedia.org/wiki/统一资源定位符, this is the Chinese version of URL page on Wikipedia.

https://zh.wikipedia.org/wiki/统一资源定位符

URL dealt by social media

However, the human friendly URLs are not standard and not allowed by social media platforms like Weibo and Twitter. Here are the results that sharing https://zh.wikipedia.org/wiki/统一资源定位符 on the above two platforms.

Weibo
url shared on weibo

Twitter
url shared on twitter

Both of them are not shown the URL correctly, as the Chinese characters are not allowed to be used in a URL directly.

URL with percentage encoding

In fact, those characters are converted by a method called percentage. Here is a percentage url:https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6. The percentage URL can be used in social media platforms as well as in browsers.

Detecting URLs from Strings

We use NSDataDetector to get the URLs from a String.

let urlString = "This is a URL: https://zh.wikipedia.org/wiki/统一资源定位符"
let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
dataDetector.matches(in: urlString, range: NSRange(location: 0, length: (urlString as NSString).length))
    .forEach {
        print($0.range) // {15, 37}
        print(urlString[Range($0.range, in: urlString)!]) // https://zh.wikipedia.org/wiki/统一资源定位符
        print($0.url!.absoluteString) // https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6
}

The benefit of using NSDataDetector is that we could get the percentage URL automatically.

Bugs of NSDataDetector with link type

The above was enough for String and URLs. However, good things do not always happen. In practice, I found that there were some bugs in NSDataDetector which lead something fatal.

let s = """
// no issue
1. iOS版:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
2. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
3. iOS https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

// issue
4. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
5. iOS·https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
6. iOSˆhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
7. iOSøhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
8. iOS_https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

// unknown scheme issue
9. iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
"""

let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)

dataDetector.matches(in: s, range: NSRange(location: 0, length: (s as NSString).length))
     .enumerated().forEach { index, match in
        print(index + 1)
        print(match.url!.absoluteString)
        print(match.url!.scheme!)
        print(s[Range(match.range, in: s)!])
        print()
}

The result is

1
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

2
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

3
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

4
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

5
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

6
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

7
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

8
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

9
iOShttps://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
iOShttps
iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

From 4 to 8, the range of match dropped the https:// part from the original string. For the last, the scheme of the URL was not as expected.

Workaround

NSDataDetector is an API provided by Apple. We could file a bug and wait until Apple fixes it. Meanwhile, we should write a workaround and keep our apps working.

For issues in 4 to 8, we should double check if there were missing schemes ahead. If there was, we should recalculate the new NSRange and URL, as the old ones were not accuracy.

For the last issue, we thought it was a typing mistake and we would leave it alone.

import Foundation

extension NSTextCheckingResult {
    // FIXME: - Workaround for Apple API Issue
    public func extendedResultForHttp(of str:String) -> (NSRange, URL)? {
        guard resultType == .link else {
            return nil
        }
        
        guard url?.scheme?.lowercased().hasPrefix("http") ?? false else {
            return nil
        }
        
        // check bug with http:// or https://
        let httpScheme = "http://"
        let httpsScheme = "https://"
        let otherScheme = "x://"
        var location = self.range.location - (httpScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpScheme.count)
            
            if httpScheme == str[lowerBound..<upperBound] {
                let _nsRange = NSRange(location: location, length: (httpScheme as NSString).length + self.range.length)
                let url = URL(string: urlStringWithOriginalScheme(httpScheme)!)!
                
                return (_nsRange, url)
            }
        }
        
        location = self.range.location - (httpsScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
            
            if httpsScheme == str[lowerBound..<upperBound] {
                let _nsRange = NSRange(location: location, length: (httpsScheme as NSString).length + self.range.length)
                let url = URL(string: urlStringWithOriginalScheme(httpsScheme)!)!
                
                return (_nsRange, url)
            }
        }
        
        // check bug with other protocols
        location = self.range.location - (otherScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
            let schemeStr = String(str[lowerBound..<upperBound])
            let _nsRange = NSRange(location: 0, length: (schemeStr as NSString).length)
            let regularExpress = try! NSRegularExpression(pattern: "^[a-zA-Z]+://", options: .anchorsMatchLines)
            
            if regularExpress.firstMatch(in: schemeStr, range: _nsRange) != nil {
                return nil
            }
        }
        
        // good result
        return (self.range, self.url!)
    }
    
    private func urlStringWithOriginalScheme(_ originalScheme:String) -> String? {
        if let url = self.url, var scheme = url.scheme {
            scheme += "://"
            let str = url.absoluteString
            return originalScheme + String(str[str.index(str.startIndex, offsetBy: scheme.count)...])
        }
        
        return nil
    }
}

Others

Swift 5 String补遗