Asked  9 Months ago    Answers:  5   Viewed   47 times

I need help with writing a regular expression in PHP. I am not friend enough to get it right. I need to find all integers in a string and format them as follows:

1 to 001000
0.13 to 000130
100 to 0100000 

and so on. In normal speech it means - edit the number prepend it with zeros or edit it to be in same "order". The point is to have all of the integers it the string in this format (6 numbers). Could someone help me please? I will not post my tryouts here because they are too foolish:-)

 Answers

43

By using the 'e' modifier on preg_replace() (or using preg_replace_callback()) you can use @codaddict's answer:

function my_format($d) {
    return sprintf('%06d', round($d * 1000));
}

$your_string = '0.12 100 number 1.12 something';

preg_replace('/d+(.d+)?/e', 'my_format(\0)', $your_string);
// Should give '000120 100000 number 001120 something'
Wednesday, March 31, 2021
 
cyber_truite
answered 9 Months ago
15

This will work only for non-nested parentheses:

    $regex = <<<HERE
    /  "  ( (?:[^"\\]++|\\.)*+ ) "
     | '  ( (?:[^'\\]++|\\.)*+ ) '
     | ( ( [^)]*                  ) )
     | [s,]+
    /x
    HERE;

    $tags = preg_split($regex, $str, -1,
                         PREG_SPLIT_NO_EMPTY
                       | PREG_SPLIT_DELIM_CAPTURE);

The ++ and *+ will consume as much as they can and give nothing back for backtracking. This technique is described in perlre(1) as the most efficient way to do this kind of matching.

Wednesday, March 31, 2021
 
KingCrunch
answered 9 Months ago
59

For the second one, you could look for a line that starts with "10", then "something" and then 9 consecutive spaces. After the spaces is what you need to capture. The regex is ^10.+s{9}(.*)

Friday, May 28, 2021
 
exxed
answered 7 Months ago
52

The standard disclaimer applies: Parsing HTML with regular expressions is not ideal. Success depends on the well-formedness of the input on a character-by-character level. If you cannot guarantee this, the regex will fail to do the Right Thing at some point.

Having said that:

<ab[^>]*>(.*?)</a>   // match group one will contain the link text
Saturday, May 29, 2021
 
lewiguez
answered 7 Months ago
35

In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.

I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.

Here's the code:

public static string Capitalise(string input)
{
  //now the first character
  return Regex.Replace(input, @"(?<=(^|[.;:])s*)[a-z]",
    (match) => { return match.Value.ToUpper(); });
}

The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.

This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).

All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).

Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.

EDIT

Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.

Here's the crude benchmark I did:

public string LowerCaseLipsum
{
  get
  {
    //went to lipsum.com and generated 10 paragraphs of lipsum
    //which I then initialised into the backing field with @"[lipsumtext]".ToLower()
    return _lowerCaseLipsum;
  }
 }
 [TestMethod]
 public void CapitaliseAhmadsWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(^|p{P}s+)(w+)", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
                      + m.Groups[2].Value.Substring(0, 1).ToUpper()
                           + m.Groups[2].Value.Substring(1)));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

 [TestMethod]
 public void CapitaliseLookAroundWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(?<=(^|[.;:])s*)[a-z]", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).

Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.

Saturday, August 14, 2021
 
huhushow
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share