肇鑫的技术博客

业精于勤,荒于嬉

Best Practice of URL Related Operations with Strings

What is a URL?

A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

https://en.wikipedia.org/wiki/URL

URL shown in address bar of a browser

In a browser like Safari or Chrome, a URL is shown in a human friendly way. For example, you may see a URL like this for https://zh.wikipedia.org/wiki/统一资源定位符, this is the Chinese version of URL page on Wikipedia.

https://zh.wikipedia.org/wiki/统一资源定位符

URL dealt by social media

However, the human friendly URLs are not standard and not allowed by social media platforms like Weibo and Twitter. Here are the results that sharing https://zh.wikipedia.org/wiki/统一资源定位符 on the above two platforms.

Weibo
url shared on weibo

Twitter
url shared on twitter

Both of them are not shown the URL correctly, as the Chinese characters are not allowed to be used in a URL directly.

URL with percentage encoding

In fact, those characters are converted by a method called percentage. Here is a percentage url:https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6. The percentage URL can be used in social media platforms as well as in browsers.

Detecting URLs from Strings

We use NSDataDetector to get the URLs from a String.

let urlString = "This is a URL: https://zh.wikipedia.org/wiki/统一资源定位符"
let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
dataDetector.matches(in: urlString, range: NSRange(location: 0, length: (urlString as NSString).length))
    .forEach {
        print($0.range) // {15, 37}
        print(urlString[Range($0.range, in: urlString)!]) // https://zh.wikipedia.org/wiki/统一资源定位符
        print($0.url!.absoluteString) // https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6
}

The benefit of using NSDataDetector is that we could get the percentage URL automatically.

Bugs of NSDataDetector with link type

The above was enough for String and URLs. However, good things do not always happen. In practice, I found that there were some bugs in NSDataDetector which lead something fatal.

let s = """
// no issue
1. iOS版:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
2. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
3. iOS https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

// issue
4. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
5. iOS·https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
6. iOSˆhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
7. iOSøhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
8. iOS_https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

// unknown scheme issue
9. iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
"""

let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)

dataDetector.matches(in: s, range: NSRange(location: 0, length: (s as NSString).length))
     .enumerated().forEach { index, match in
        print(index + 1)
        print(match.url!.absoluteString)
        print(match.url!.scheme!)
        print(s[Range(match.range, in: s)!])
        print()
}

The result is

1
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

2
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

3
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

4
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

5
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

6
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

7
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

8
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

9
iOShttps://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
iOShttps
iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8

From 4 to 8, the range of match dropped the https:// part from the original string. For the last, the scheme of the URL was not as expected.

Workaround

NSDataDetector is an API provided by Apple. We could file a bug and wait until Apple fixes it. Meanwhile, we should write a workaround and keep our apps working.

For issues in 4 to 8, we should double check if there were missing schemes ahead. If there was, we should recalculate the new NSRange and URL, as the old ones were not accuracy.

For the last issue, we thought it was a typing mistake and we would leave it alone.

import Foundation

extension NSTextCheckingResult {
    // FIXME: - Workaround for Apple API Issue
    public func extendedResultForHttp(of str:String) -> (NSRange, URL)? {
        guard resultType == .link else {
            return nil
        }
        
        guard url?.scheme?.lowercased().hasPrefix("http") ?? false else {
            return nil
        }
        
        // check bug with http:// or https://
        let httpScheme = "http://"
        let httpsScheme = "https://"
        let otherScheme = "x://"
        var location = self.range.location - (httpScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpScheme.count)
            
            if httpScheme == str[lowerBound..<upperBound] {
                let _nsRange = NSRange(location: location, length: (httpScheme as NSString).length + self.range.length)
                let url = URL(string: urlStringWithOriginalScheme(httpScheme)!)!
                
                return (_nsRange, url)
            }
        }
        
        location = self.range.location - (httpsScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
            
            if httpsScheme == str[lowerBound..<upperBound] {
                let _nsRange = NSRange(location: location, length: (httpsScheme as NSString).length + self.range.length)
                let url = URL(string: urlStringWithOriginalScheme(httpsScheme)!)!
                
                return (_nsRange, url)
            }
        }
        
        // check bug with other protocols
        location = self.range.location - (otherScheme as NSString).length
        
        if location >= 0 {
            let lowerBound = str.index(str.startIndex, offsetBy: location)
            let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
            let schemeStr = String(str[lowerBound..<upperBound])
            let _nsRange = NSRange(location: 0, length: (schemeStr as NSString).length)
            let regularExpress = try! NSRegularExpression(pattern: "^[a-zA-Z]+://", options: .anchorsMatchLines)
            
            if regularExpress.firstMatch(in: schemeStr, range: _nsRange) != nil {
                return nil
            }
        }
        
        // good result
        return (self.range, self.url!)
    }
    
    private func urlStringWithOriginalScheme(_ originalScheme:String) -> String? {
        if let url = self.url, var scheme = url.scheme {
            scheme += "://"
            let str = url.absoluteString
            return originalScheme + String(str[str.index(str.startIndex, offsetBy: scheme.count)...])
        }
        
        return nil
    }
}

Others

Swift 5 String补遗

Weight and Line Height of Font between macOS and iOS

When converting text to image, the converted images were different between macOS and iOS. The main differences are font weight and line height.

Font Weight

Thought the font weight and the fonts are the same, on macOS the font result is always thicker. I don't know why. But in my experience, if you use "HelveticaNeue-Light" for iOS, use "HelveticaNeue-Thin" for macOS.

Line Height

The line height of font is even tricky.

Equation

In Apple's doc, Apple gives below graph. We could draw a simple equation from the graph.

line height = ascent + decent + line gap (leading)

textpg_intro_2x

So I did two tests on both iOS and macOS in Playgound.

// macOS
func getFontInfo(_ name:String) {
    let font = NSFont(name: name, size: 17.0)!
    print(font.ascender) // 13.09033203125
    print(font.descender) // -3.90966796875
    print(font.leading) // 0.0
    print(font.ascender - font.descender + font.leading) // 17.0
    
    let layoutManager = NSLayoutManager()
    print(layoutManager.defaultLineHeight(for: font)) // 20.0
}

getFontInfo("Helvetica")
// iOS
func getFontInfo(_ name:String) {
    let font = UIFont(name: name, size: 17.0)!
    print(font.ascender) // 15.64033203125
    print(font.descender) // -3.90966796875
    print(font.leading) // 0.0
    print(font.ascender - font.descender + font.leading) // 19.55
    print(font.lineHeight) // // 19.55
}

getFontInfo("Helvetica")

From the two tests, we could draw two conclusions:

  1. The equation on iOS was balanced, but on macOS was not.
  2. For the same font with the same weight, the ascender were different.

I didn't know why those happened. So I sent an "Apple Developer Technical Support". Here was the reply from Apple.

apple's reply

According to Apple, if I wanted to use the equation, I should use Core Text framework. But in fact Apple didn't provide line height in Core Text.

Then I did another two tests.

// macOS
func getLineHeightForFontName(_ name:String) {
    let font = CTFontCreateWithName(name as CFString, 17.0, nil)
    
    print(CTFontGetAscent(font)) // 13.09033203125
    print(CTFontGetDescent(font)) // 3.90966796875
    print(CTFontGetLeading(font)) // 0.0
}

getLineHeightForFontName("Helvetica")
// iOS
func getLineHeightForFontName(_ name:String) {
    let font = CTFontCreateWithName(name as CFString, 17.0, nil)
    
    print(CTFontGetAscent(font)) // 13.09033203125
    print(CTFontGetDescent(font)) // 3.90966796875
    print(CTFontGetLeading(font)) // 0.0
}

getLineHeightForFontName("Helvetica")

From all four tests, we could get the conclusions:

  1. Though on iOS, the equation was balanced. The ascent property was modified by Apple.
  2. On macOS, the line height was modified by Apple.
  3. From the above two conclusions, both NSFont and UIFont were not trusted. The only trusted line height was something we get from Core Text.

Line Height

#if os(macOS)
func getLineHeight(_ font:NSFont) -> CGFloat {
    let ctFont = CTFontCreateWithName(font.fontName as CFString, font.pointSize, nil)
    return CTFontGetAscent(ctFont) + CTFontGetDescent(ctFont) + CTFontGetLeading(ctFont)
}
#else
func getLineHeight(_ font:UIFont) -> CGFloat {
    let ctFont = CTFontCreateWithName(font.fontName as CFString, font.pointSize, nil)
    return CTFontGetAscent(ctFont) + CTFontGetDescent(ctFont) + CTFontGetLeading(ctFont)
}
#endif

Others

NSTextView Best Practice

Text Programming Guide for iOS

Cocoa Text Architecture Guide

Core Text - Calculating line heights

Swift 5 String补遗

Swift 5中的String采用了UTF-8编码。而NSString是UTF-16编码的。NSStringString的转换是lazy的,这句话充满了刀光剑影。

所谓lazySwift中最常见的用法,简单的描述就是,当在需要复制的时候,不进行复制,而仅标记,然后如果后面的操作是读操作,就一直读,直到出现了写操作,才会真正将内容分离写入。这么做的好处,是性能比较好,如果有幸最终也没有写入操作,那么就完全省去了写入操作和额外的内存占用。

不过由于StringNSString的编码不同,这种lazy导致了一个严重的问题。就是如果你从某个框架获得了一个String,你其实是不知道它是原生的String,还是过来的NSString。比如你读取了一个String.Index,等你要用的时候,它可能已经失效了。

举一个简单的例子:

import Foundation

let ns:NSString = "ab两只老虎,两只老虎,跑得快,跑得快。"
var s = ns as String

let aIndex = s.firstIndex(of: "只")!
print(s[aIndex]) // 只
s += ""
print(s[aIndex]) // \270

为了解决上面的问题,Swift有两项硬性规定。

  1. 对于String.Index,索引只对于它自身的String。使用非自身字符串的索引,可能导致未知的问题。
  2. String只要有任何改变,String.Index都应该重新获取。

解决办法

由于String.Index非常容易失效,且不能直接使用。因此,在一个字符串使用另一个字符串的索引是需要转换才能使用。但是,这种转换,Swift本身是没有直接提供的。需要自己算一下。

import Foundation

extension String {
    func sameIndex(_ index:String.Index, of str:String) -> String.Index? {
        let offSet = self.distance(from: self.startIndex, to: index)
        return str.index(str.startIndex, offsetBy: offSet, limitedBy: str.endIndex)
    }
}

let ns:NSString = "ab两只老虎,两只老虎,跑得快,跑得快。"
var s = ns as String

let aIndex = s.firstIndex(of: "只")!
print(s[aIndex]) // 只
let s1 = s + ""
let i1 = s.sameIndex(aIndex, of: s1)!
print(s1[i1]) // 只

于此类似,Range<String.Index>也有同样的问题。更扩大一步说,只要是支持Collection类型的,都有这个问题。