Asked  7 Months ago    Answers:  5   Viewed   27 times

When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal t n r characters from a string with a single space.

Now, the first pattern I tried was:

/(?:[trn])+/

which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:

/(?:\[trn])+/

or

/(?:[trn])+/

these patterns - to my surprise - both work. Why are these extra backslashes necessary?

 Answers

60

You need 4 backslashes to represent 1 in regex because:

  • 2 backslashes are used for unescaping in a string ("\\" -> \)
  • 1 backslash is used for unescaping in the regex engine (\ -> )

From the PHP doc,

escaping any other character will result in the backslash being printed too1

Hence for \[,

  • 1 backslash is used for unescaping the , one stay because [ is invalid ("\[" -> \[)
  • 1 backslash is used for unescaping in the regex engine (\[ -> [)

Yes it works, but not a good practice.

Wednesday, March 31, 2021
 
ShadowZzz
answered 7 Months ago
38

Right, it is JSON with padding. You have to remove the function name (and parenthesis) and then you can parse the JSON with json_decode.

I once wrote a function for that:

function jsonp_decode($jsonp, $assoc = false) { // PHP 5.3 adds depth as third parameter to json_decode
    if($jsonp[0] !== '[' && $jsonp[0] !== '{') { // we have JSONP
       $jsonp = substr($jsonp, strpos($jsonp, '('));
    }
    return json_decode(trim($jsonp,'();'), $assoc);
}

Usage:

$data = jsonp_decode($response);

DEMO

Wednesday, March 31, 2021
 
akohout
answered 7 Months ago
39

Thank everybody for help.

My solution based on 'bobbogo' solution. Thank you.

Regular expression:

(?=(XX.*?YY.*?ZZ))(?=(.*ZZ))

Result (from RegexBuggy):

1 XXccYYeeXX_ZZ     XXccYYeeXX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ZZ
2 XX_ZZkkYYmmXX_ZZ      XX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ZZ
3 XX_ZZnnXXooYYuuXX_ZZ  XX_ZZnnXXooYYuuXX_ZZ
4 XXooYYuuXX_ZZ     XXooYYuuXX_ZZ

Possible it can by more optimized? I am not big professional in regex.

Saturday, May 29, 2021
 
McAn
answered 5 Months ago
48

There are couple of problems:

  1. Your regex pattern will also match an input of more than 15 characters.
  2. Your regex will also other non-allowed characters in the middle like @ or # due to use of S

You can fix it by using a negative lookahead to disallow consecutive occurrence of period/hyphen/underscore and remove S from middle of regex that allows any non-space character

^[a-zA-Z0-9](?!.*[_.-]{2})[w.-]{4,13}[a-zA-Z0-9]$

RegEx Demo

Saturday, May 29, 2021
 
SuperString
answered 5 Months ago
98

HTML is not a regular language and cannot be correctly parsed with a regex. Use a DOM parser instead. Here's a solution using PHP's built-in DOMDocument class:

$string = '<ul id="value" name="Bob" custom-tag="customData">';

$dom = new DOMDocument();
$dom->loadHTML($string);

$result = array();

$ul = $dom->getElementsByTagName('ul')->item(0);
if ($ul->hasAttributes()) {
    foreach ($ul->attributes as $attr) {
        $name = $attr->nodeName;
        $value = $attr->nodeValue;    
        $result[$name] = $value;
    }
}

print_r($result);

Output:

Array
(
    [id] => value
    [name] => Bob
    [custom-tag] => customData
)
Friday, July 30, 2021
 
the_e
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :