Asked  7 Months ago    Answers:  5   Viewed   35 times

I am working with the Amazon Mechanical Turk API and it will only allow me to use regular expressions to filter a field of data.

I would like to input an integer range to a function, such as 256-311 or 45-1233, and return a regex that would match only that range.

A regex matching 256-321 would be:

b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))b

That part is fairly easy, but I am having trouble with the loop to create this regex.

I am trying to build a function defined like this:

function getRangeRegex( int fromInt, int toInt)
{

      return regexString;
}

I looked all over the web and I am surprised that it doesn't look like anyone has solved this in the past. It is a difficult problem...

Thanks for your time.

 Answers

33

Here's a quick hack:

<?php

function regex_range($from, $to) {

  if($from < 0 || $to < 0) {
    throw new Exception("Negative values not supported"); 
  }

  if($from > $to) {
    throw new Exception("Invalid range $from..$to, from > to"); 
  }

  $ranges = array($from);
  $increment = 1;
  $next = $from;
  $higher = true;

  while(true) {

    $next += $increment;

    if($next + $increment > $to) {
      if($next <= $to) {
        $ranges[] = $next;
      }
      $increment /= 10;
      $higher = false;
    }
    else if($next % ($increment*10) === 0) {
      $ranges[] = $next;
      $increment = $higher ? $increment*10 : $increment/10;
    }

    if(!$higher && $increment < 10) {
      break;
    }
  }

  $ranges[] = $to + 1;

  $regex = '/^(?:';

  for($i = 0; $i < sizeof($ranges) - 1; $i++) {
    $str_from = (string)($ranges[$i]);
    $str_to = (string)($ranges[$i + 1] - 1);

    for($j = 0; $j < strlen($str_from); $j++) {
      if($str_from[$j] == $str_to[$j]) {
        $regex .= $str_from[$j];
      }
      else {
        $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]";
      }
    }
    $regex .= "|";
  }

  return substr($regex, 0, strlen($regex)-1) . ')$/';
}

function test($from, $to) {
  try {
    printf("%-10s %sn", $from . '-' . $to, regex_range($from, $to));
  } catch (Exception $e) {
    echo $e->getMessage() . "n";
  }
}

test(2, 8);
test(5, 35);
test(5, 100);
test(12, 1234);
test(123, 123);
test(256, 321);
test(256, 257);
test(180, 195);
test(2,1);
test(-2,4);

?>

which produces:

2-8        /^(?:[2-7]|8)$/
5-35       /^(?:[5-9]|[1-2][0-9]|3[0-5])$/
5-100      /^(?:[5-9]|[1-9][0-9]|100)$/
12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/
123-123    /^(?:123)$/
256-321    /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/
256-257    /^(?:256|257)$/
180-195    /^(?:18[0-9]|19[0-5])$/
Invalid range 2..1, from > to
Negative values not supported

Not properly tested, use at your own risk!

And yes, the generated regex could be written more compact in many cases, but I leave that as an exercise for the reader :)

Wednesday, March 31, 2021
 
jab
answered 7 Months ago
jab
10

mark used methods in phpdoc as @used example

/**
* @uses  _iAmUsed()
* @param string $whoAreYou
*/ 
public function run($whoAreYou)
{
    $methodName = '_iAm' . $whoAreYou;
    if (method_exists($this, $methodName)) {
        $this->$methodName();
    }
}
Wednesday, March 31, 2021
 
AlterPHP
answered 7 Months ago
29

In case you don't want a regex (including the strripos() following xzyfer comment):

<?php 

function stripTypeTitle($title) {
    $dvdpos = strripos($title, 'dvd');
    $bluraypos = strripos($title, 'bluray');
    if ($dvdpos !== false && $dvdpos > $bluraypos) {
        $title = substr($title, 0, $dvdpos);
    }
    if ($bluraypos !== false && $bluraypos > $dvdpos) {
        $title = substr($title, 0, $bluraypos);
    }
    return $title;
}

$title = "Avatar DVD 2009";
echo stripTypeTitle($title)."<br/>";
$title = "War of the Roses DVD 1989 Region 1 US import";
echo stripTypeTitle($title)."<br/>";
$title = "Wanted Bluray 2008 US Import";
echo stripTypeTitle($title)."<br/>";
$title = "This Bluray is Wanted DVD 2008 US Import";
echo stripTypeTitle($title)."<br/>";
$title = "This DVD is Wanted Bluray 2008 US Import";
echo stripTypeTitle($title)."<br/>";

?>

Prints:

Avatar
War of the Roses
Wanted
This Bluray is Wanted
This DVD is Wanted 
Saturday, May 29, 2021
 
ShadowZzz
answered 5 Months ago
61

Functional programming isn't limited to reduce, filter, and map; it's about functions. This means we don't have to rely on perverse knowledge like Array.from ({ length: x }) where an object with a length property can be treated like an array. This kind of behavior is bewildering for beginners and mental overhead for anyone else. It think you'll enjoy writing programs that encode your intentions more clearly.

reduce starts with 1 or more values and reduces to (usually) a single value. In this case, you actually want the reverse of a reduce (or fold), here called unfold. The difference is we start with a single value, and expand or unfold it into (usually) multiple values.

We start with a simplified example, alphabet. We begin unfolding with an initial value of 97, the char code for the letter a. We stop unfolding when the char code exceeds 122, the char code for the letter z.

const unfold = (f, initState) =>
  f ( (value, nextState) => [ value, ...unfold (f, nextState) ]
    , () => []
    , initState
    )

const alphabet = () =>
  unfold
    ( (next, done, char) =>
        char > 122
          ? done ()
          : next ( String.fromCharCode (char) // value to add to output
                 , char + 1                   // next state
                 )
    , 97 // initial state
    )
    
console.log (alphabet ())
// [ a, b, c, ..., x, y, z ]

Above, we use a single integer for our state, but other unfolds may require a more complex representation. Below, we show the classic Fibonacci sequence by unfolding a compound initial state of [ n, a, b ] where n is a decrementing counter, and a and b are numbers used to compute the sequence's terms. This demonstrates unfold can be used with any seed state, even arrays or objects.

const fib = (n = 0) =>
  unfold
    ( (next, done, [ n, a, b ]) =>
        n < 0
          ? done ()
          : next ( a                   // value to add to output
                 , [ n - 1, b, a + b ] // next state
                 )
    , [ n, 0, 1 ] // initial state
    )

console.log (fib (20))
// [ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765 ]

Now we have the confidence to write pagination. Again, our initial state is compound data [ page, count ] as we need to keep track of the page to add, and how many pages (count) we've already added.

Another advantage to this approach is that you can easily parameterize things like 10 or -5 or +1 and there's a sensible, semantic structure to place them in.

const unfold = (f, initState) =>
  f ( (value, nextState) => [ value, ...unfold (f, nextState) ]
    , () => []
    , initState
    )
    
const pagination = (totalPages, currentPage = 1) =>
  unfold
    ( (next, done, [ page, count ]) =>
        page > totalPages
          ? done ()
          : count > 10
            ? done ()
            : next (page, [ page + 1, count + 1 ])
    , [ Math.max (1, currentPage - 5), 0 ]
    )

console.log (pagination (40, 1))
// [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]

console.log (pagination (40, 14))
// [ 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ]

console.log (pagination (40, 38))
// [ 33, 34, 35, 36, 37, 38, 39, 40 ]

console.log (pagination (40, 40))
// [ 35, 36, 37, 38, 39, 40 ]

Above, there are two conditions which result in a call to done (). We can collapse these using || and the code reads a little nicer

const pagination = (totalPages, currentPage = 1) =>
  unfold
    ( (next, done, [ page, count ]) =>
        page > totalPages || count > 10
          ? done ()
          : next (page, [ page + 1, count + 1 ])
    , [ Math.max (1, currentPage - 5), 0 ]
    )
Monday, August 2, 2021
 
Extrakun
answered 3 Months ago
55

Y'all are making this way too complicated. The original regex matches words made of letters only or numbers (integers, floating point including exponential notation).

If you need to match words made of letters and numbers, then the regex for that is [a-zA-Zd]+. Per the module docs, you'll also want to specify what to skip, and that matches [^a-zA-Zd]+.

$self->{tokenrex} = qr/([a-zd]+)/i;
$self->{skiprex}  = qr/([^a-zd]+)/i;

If you need to recognize numbers as the module documentation shows in its example, then please let me know, and I'll be happy to add that back in for you. From your description, that doesn't sound like what you need.

Tuesday, August 24, 2021
 
Corsair
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :