What is a URL?
A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.
URL shown in address bar of a browser
In a browser like Safari or Chrome, a URL is shown in a human friendly way. For example, you may see a URL like this for https://zh.wikipedia.org/wiki/统一资源定位符
, this is the Chinese version of URL page on Wikipedia.
URL dealt by social media
However, the human friendly URLs are not standard and not allowed by social media platforms like Weibo and Twitter. Here are the results that sharing https://zh.wikipedia.org/wiki/统一资源定位符
on the above two platforms.
Weibo
Twitter
Both of them are not shown the URL correctly, as the Chinese characters are not allowed to be used in a URL directly.
URL with percentage encoding
In fact, those characters are converted by a method called percentage. Here is a percentage url:https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6
. The percentage URL can be used in social media platforms as well as in browsers.
Detecting URLs from Strings
We use NSDataDetector
to get the URLs from a String
.
let urlString = "This is a URL: https://zh.wikipedia.org/wiki/统一资源定位符"
let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
dataDetector.matches(in: urlString, range: NSRange(location: 0, length: (urlString as NSString).length))
.forEach {
print($0.range) // {15, 37}
print(urlString[Range($0.range, in: urlString)!]) // https://zh.wikipedia.org/wiki/统一资源定位符
print($0.url!.absoluteString) // https://zh.wikipedia.org/wiki/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6
}
The benefit of using NSDataDetector
is that we could get the percentage URL automatically.
Bugs of NSDataDetector with link type
The above was enough for String and URLs. However, good things do not always happen. In practice, I found that there were some bugs in NSDataDetector
which lead something fatal.
let s = """
// no issue
1. iOS版:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
2. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
3. iOS https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
// issue
4. iOS:https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
5. iOS·https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
6. iOSˆhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
7. iOSøhttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
8. iOS_https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
// unknown scheme issue
9. iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
"""
let dataDetector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
dataDetector.matches(in: s, range: NSRange(location: 0, length: (s as NSString).length))
.enumerated().forEach { index, match in
print(index + 1)
print(match.url!.absoluteString)
print(match.url!.scheme!)
print(s[Range(match.range, in: s)!])
print()
}
The result is
1
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
2
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
3
https://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
https
https://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
4
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
5
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
6
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
7
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
8
http://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
http
itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
9
iOShttps://itunes.apple.com/cn/app/%E5%92%95%E5%94%A72/id1366583897?mt=8
iOShttps
iOShttps://itunes.apple.com/cn/app/咕唧2/id1366583897?mt=8
From 4 to 8, the range of match dropped the https://
part from the original string. For the last, the scheme of the URL was not as expected.
Workaround
NSDataDetector
is an API provided by Apple. We could file a bug and wait until Apple fixes it. Meanwhile, we should write a workaround and keep our apps working.
For issues in 4 to 8, we should double check if there were missing schemes ahead. If there was, we should recalculate the new NSRange
and URL
, as the old ones were not accuracy.
For the last issue, we thought it was a typing mistake and we would leave it alone.
import Foundation
extension NSTextCheckingResult {
// FIXME: - Workaround for Apple API Issue
public func extendedResultForHttp(of str:String) -> (NSRange, URL)? {
guard resultType == .link else {
return nil
}
guard url?.scheme?.lowercased().hasPrefix("http") ?? false else {
return nil
}
// check bug with http:// or https://
let httpScheme = "http://"
let httpsScheme = "https://"
let otherScheme = "x://"
var location = self.range.location - (httpScheme as NSString).length
if location >= 0 {
let lowerBound = str.index(str.startIndex, offsetBy: location)
let upperBound = str.index(lowerBound, offsetBy: httpScheme.count)
if httpScheme == str[lowerBound..<upperBound] {
let _nsRange = NSRange(location: location, length: (httpScheme as NSString).length + self.range.length)
let url = URL(string: urlStringWithOriginalScheme(httpScheme)!)!
return (_nsRange, url)
}
}
location = self.range.location - (httpsScheme as NSString).length
if location >= 0 {
let lowerBound = str.index(str.startIndex, offsetBy: location)
let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
if httpsScheme == str[lowerBound..<upperBound] {
let _nsRange = NSRange(location: location, length: (httpsScheme as NSString).length + self.range.length)
let url = URL(string: urlStringWithOriginalScheme(httpsScheme)!)!
return (_nsRange, url)
}
}
// check bug with other protocols
location = self.range.location - (otherScheme as NSString).length
if location >= 0 {
let lowerBound = str.index(str.startIndex, offsetBy: location)
let upperBound = str.index(lowerBound, offsetBy: httpsScheme.count)
let schemeStr = String(str[lowerBound..<upperBound])
let _nsRange = NSRange(location: 0, length: (schemeStr as NSString).length)
let regularExpress = try! NSRegularExpression(pattern: "^[a-zA-Z]+://", options: .anchorsMatchLines)
if regularExpress.firstMatch(in: schemeStr, range: _nsRange) != nil {
return nil
}
}
// good result
return (self.range, self.url!)
}
private func urlStringWithOriginalScheme(_ originalScheme:String) -> String? {
if let url = self.url, var scheme = url.scheme {
scheme += "://"
let str = url.absoluteString
return originalScheme + String(str[str.index(str.startIndex, offsetBy: scheme.count)...])
}
return nil
}
}