Asked  7 Months ago    Answers:  5   Viewed   41 times

I've got this regular expression which removes common words($commonWords) from a string($input) an I would like to tweak it so that it ignores hyphenated words as these sometimes contain common words.

return preg_replace('/b('.implode('|',$commonWords).')b/i','',$input);

thanks

 Answers

12

Try

return preg_replace('/(?<!-)b('.implode('|',$commonWords).')b(?!-)/i','',$input);

This adds negative lookaround expressions to the start and end of the regex so that a match is only allowed if there is no dash before or after the match.

Saturday, May 29, 2021
 
VieStar
answered 7 Months ago
45

Use a callback, you can detect which pattern matched by using capturing groups, like (?:(patternt1)|(pattern2)|(etc), only the matching patterns capturing group(s) will be defined.

The only problem with that is that your current capturing groups would be shifted. To fix (read workaround) that you could use named groups. (A branch reset (?|(foo)|(bar)) would work (if supported in your version), but then you'd have to detect which pattern has matched using some other way.)

Example

function replace_callback($matches){
    if(isset($matches["m1"])){
        return "foo";
    }
    if(isset($matches["m2"])){
        return "bar";
    }
    if(isset($matches["m3"])){
        return "baz";
    }
    return "something is wrong ;)";
}

$re = "/(?|(?:regex1)(?<m1>)|(?:reg(\s*)ex|2)(?<m2>)|(?:(back refs) work as intended \1)(?<m3>))/";

$rep_string = preg_replace_callback($re, "replace_callback", $string);

Not tested (don't have PHP here), but something like this could work.

Saturday, May 29, 2021
 
Anax
answered 7 Months ago
62

In a regular expression, you can "capture" parts of the matched string with (brackets); in this case, you are capturing the (^|_) and ([a-z]) parts of the match. These are numbered starting at 1, so you have back-references 1 and 2. Match 0 is the whole matched string.

The /e modifier takes a replacement string, and substitutes backslash followed by a number (e.g. 1) with the appropriate back-reference - but because you're inside a string, you need to escape the backslash, so you get '\1'. It then (effectively) runs eval to run the resulting string as though it was PHP code (which is why it's being deprecated, because it's easy to use eval in an insecure way).

The preg_replace_callback function instead takes a callback function and passes it an array containing the matched back-references. So where you would have written '\1', you instead access element 1 of that parameter - e.g. if you have an anonymous function of the form function($matches) { ... }, the first back-reference is $matches[1] inside that function.

So a /e argument of

'do_stuff(\1) . "and" . do_stuff(\2)'

could become a callback of

function($m) { return do_stuff($m[1]) . "and" . do_stuff($m[2]); }

Or in your case

'strtoupper("\2")'

could become

function($m) { return strtoupper($m[2]); }

Note that $m and $matches are not magic names, they're just the parameter name I gave when declaring my callback functions. Also, you don't have to pass an anonymous function, it could be a function name as a string, or something of the form array($object, $method), as with any callback in PHP, e.g.

function stuffy_callback($things) {
    return do_stuff($things[1]) . "and" . do_stuff($things[2]);
}
$foo = preg_replace_callback('/([a-z]+) and ([a-z]+)/', 'stuffy_callback', 'fish and chips');

As with any function, you can't access variables outside your callback (from the surrounding scope) by default. When using an anonymous function, you can use the use keyword to import the variables you need to access, as discussed in the PHP manual. e.g. if the old argument was

'do_stuff(\1, $foo)'

then the new callback might look like

function($m) use ($foo) { return do_stuff($m[1], $foo); }

Gotchas

  • Use of preg_replace_callback is instead of the /e modifier on the regex, so you need to remove that flag from your "pattern" argument. So a pattern like /blah(.*)blah/mei would become /blah(.*)blah/mi.
  • The /e modifier used a variant of addslashes() internally on the arguments, so some replacements used stripslashes() to remove it; in most cases, you probably want to remove the call to stripslashes from your new callback.
Tuesday, June 1, 2021
 
Gilko
answered 7 Months ago
17

According to article 11.5.2. Regular Expressions in MySQL's documentation, you can perform selections with a regular expression with the following syntax

SELECT field FROM table WHERE field REGEX pattern

In order to match simple URLS, you may use

SELECT field FROM table
 WHERE field REGEXP "^(https?://|www\.)[.A-Za-z0-9-]+\.[a-zA-Z]{2,4}"

This will match most urls like

  • www.google.il
  • http://google.com/
  • http://ww.google.net/
  • www.google.com/index.php?test=data
  • https://yahoo.dk/as
  • http://goo.gle.com/
  • http://wt.a.x24-s.org/ye/
  • www.website.info

But not

  • htp://google.com
  • ww.google.com/
  • www-google.com
  • http://google.c
  • http://goo#.com
  • httpf://google.com
Saturday, August 7, 2021
 
air
answered 4 Months ago
air
77

How about this?

([a-zA-Z]+)s([A-Z][a-z]*)s([a-zA-Z]+)

This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.

Sunday, November 28, 2021
 
bshacklett
answered 6 Days ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share