Two bugs found in Drupal 8 Core

In the previous Drupal 8 site upgrade story, I mentioned that the link field type does not correctly render URLs with certain query strings. Example: http://www.isfdb.org/cgi-bin/title.cgi?1580 is rendered incorrectly as http://www.isfdb.org/cgi-bin/title.cgi?0=.

When I didn't receive any satisfactory answers to my plea for help on the StackExchange site "Drupal Answers", I decided to go bug hunting myself. The result: I found two bugs deep inside Drupal 8 Core! For the first bug I have submitted a new Drupal Core issue (issue 3007243). For the second bug I have contributed my analysis to the discussions of two existing Drupal Core issues (issue 2987114 and issue 1464244).

If you're interested you can hit the "Read more" button to find my detailed analysis. It's the same text that I wrote as answer on "Drupal Answers", but I'm reproducing it here to give my site a little bit more content 😀

Let's start at the beginning. The Link module works with the Drupal Core class Url. At the appropriate time, the Link module gets the URI from the Url object by invoking its toString() method. For external links, this method makes use of the Drupal Core class UnroutedUrlAssembler, by invoking its assemble() method, which in turn invokes another method buildExternalUrl.

Bug #1 is in buildExternalUrl: The method contains this line:

$options['query'] = NestedArray::mergeDeep($parsed['query'], $options['query']);

This is wrong because it doesn't preserve integer keys! Remember that the query parameter in my URL is numeric (1580, see the example at the beginning of this post)? The above line causes the 1580 to be discarded, instead the result is the 0 (zero) that we have seen in the mangled query string. The correct line would be this. The all-important thing is the TRUE parameter which tells NestedArray::mergeDeepArray() to preserve integer keys.

$options['query'] = NestedArray::mergeDeepArray(array($parsed['query'], $options['query']), TRUE);

Unfortunately this is not the end of the road. Where does the "=" character in the mangled query string come from? The assemble() method of the UnroutedUrlAssembler class first breaks down the URI into its individual parts like this:

$parsed = UrlHelper::parse($uri);

Then it reassembles the URI, piece by piece. The query string gets reassembled with this line:

$uri .= '?' . UrlHelper::buildQuery($options['query']);

The UrlHelper::buildQuery function generates "key=value" pairs from the entries in the supplied array ($options['query']). It omits the "=value" part if - and only if - the value of an array entry is NULL (the check is made with the PHP function isset()). As I have found out, for my example URL the value of the array entry is not NULL, it is an empty string - and that's what causes the "=" character to be generated.

But where is the empty string coming from? The function UrlHelper::parse() uses the PHP built-in function parse_str() to break down the query string into its parts, and that PHP function is the culprit! Here's the proof:

php -r '$query="1580"; $arr = array(); parse_str($query, $arr); echo "isset: " . isset($arr[1580]) . "\n"; echo "+" . $arr[1580] . "+\n"; echo "gettype: " . gettype($arr[1580]) . "\n";'
isset: 1
++
gettype: string

So to ultimately fix the problem, either parse_str() would have to change its buggy query string parsing, or the Drupal function UrlHelper::parse() would have to use some other function to parse query strings.