Asked  7 Months ago    Answers:  5   Viewed   42 times

I have the following regex that I have been using successfully:

preg_match_all('/(d+)n(w.*)n(d{3}.d{3}.d{2})n(d.*)n(d.*)/', $text, $matches)

However I have just found that if the text that the (w.*) part matches starts with a foreign character such as Ä, then it doesn't match anything.

Can anyone help me with what the correct pattern should be instead of (w.*) to match a string that starts with any character?

Many thanks



If you do want to match umlauts, then add the regex /u modifier, or use pL in place of w. That will allow the regex to match letters outside of the ASCII range.


Saturday, May 29, 2021
answered 7 Months ago

You can use:

preg_match_all("!<span[^>]+>(.*?)</span>!", $str, $matches);

Then your text will be inside the first capture group (as seen on rubular)

With that out of the way, note that regex shouldn't be used to parse HTML. You will be better off using an XML parser, unless it's something really, really simple.

Saturday, May 29, 2021
answered 7 Months ago

This regex will do the trick:

(d+)d (d+)h (d+)m (d+)s

Each value (day, hour, minute, second) will be captured in a group.

About your regex: I don't know what do you mean by "isn't correct", but I guess it's probably failing because your regex is greedy instead of lazy (more info). Try using lazy operators, or using more specific matches (d instead of ., for example).


I need them to be separate variables

After matching, they will be put in different locations in the resulting array. Just assign them to variables. Check out an example here.

If you have trouble understanding the resulting array structure, you may want to use the PREG_SET_ORDER flag when calling preg_match_all (more information here).

Saturday, May 29, 2021
answered 7 Months ago

This should work: ^.*.(?!jpg$|png$)[^.]+$

Saturday, August 14, 2021
answered 4 Months ago

It's unclear from your wording if you want to match a string ending with .com AND NOT containing abc before that; or to match a string that doesn't have "abc followed by characters followed by .com".

Meaning, in the first case, "" does NOT match (no "abc" but doesn't end with ".com") but in the second case "" matches (because it's not "")

In the first case, you need to use negative look-behind:

# Use .* instead of .+ if you want "" to fail as well

IMPORTANT: your original expression using look-behind - #3 ( (?<!abc).*.com ) - didn't work because look-behind ONLY looks behind immediately preceding the next term. Therefore, the "something after abc" should be included in the look-behind together with abc - as my RegEx above does.

PROBLEM: my RegEx above likely won't work with your specific RegEx Engine, unless it supports general look-behinds with variable length expression (like the one above) - which ONLY .NET does these days (A good summary of what does and doesn't support what flavors of look-behind is at ).

If that is indeed the case, you will have to do double match: first, check for .com; capturing everything before it; then negative match on abc. I will use Perl syntax since you didn't specify a language:

if (/^(.*).com$/) {
    if ($1 !~ /abc/) { 
    # Or, you can just use a substring:
    # if (index($1, "abc") < 0) {
        # PROFIT!

In the second case, the EASIEST thing to do is to do a "does not match" operator - e.g. !~ in Perl (or negate a result of a match if your language doesn't support "does not match"). Example using pseudo-code:

if (NOT string.match(/$/)) ...

Please note that you don't need ".+"/".*" when using negative lookbehind;

Sunday, August 15, 2021
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :