Asked  7 Months ago    Answers:  5   Viewed   40 times

I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?

 Answers

74

It is my opinion that making a preg_ call is the most direct, concise, and reliable call versus the other posted solutions here.

echo preg_match('~^p{Lu}~u', $string) ? 'upper' : 'lower';

My pattern breakdown:

~      # starting pattern delimiter 
^      #match from the start of the input string
p{Lu} #match exactly one uppercase letter (unicode safe)
~      #ending pattern delimiter 
u      #enable unicode matching

Please take notice when ctype_ and < 'a' fail with this battery of tests.

Code: (Demo)

$tests = ['âa', 'Bbbbb', 'Éé', 'iou', '??'];

foreach ($tests as $test) {
    echo "n{$test}:";
    echo "ntPREG:  " , preg_match('~^p{Lu}~u', $test)      ? 'upper' : 'lower';
    echo "ntCTYPE: " , ctype_upper(mb_substr($test, 0, 1))  ? 'upper' : 'lower';
    echo "nt< a:   " , mb_substr($test, 0, 1) < 'a'         ? 'upper' : 'lower';

    $chr = mb_substr ($test, 0, 1, "UTF-8");
    echo "ntMB:    " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}

Output:

âa:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
Bbbbb:
    PREG:  upper
    CTYPE: upper
    < a:   upper
    MB:    upper
Éé:               <-- trouble
    PREG:  upper
    CTYPE: lower  <-- uh oh
    < a:   lower  <-- uh oh
    MB:    upper
iou:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
??:               <-- extended beyond question scope
    PREG:  upper  <-- still holding up
    CTYPE: lower
    < a:   lower
    MB:    upper  <-- still holding up

If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.


It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu can handle), you may want to check if the first character has case variants:

p{L&} or p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).

  • Source: https://www.regular-expressions.info/unicode.html

To include Roman Numerals ("Number Letters") with SMALL variants, you can add that extra range to the pattern if necessary.

https://www.fileformat.info/info/unicode/category/Nl/list.htm

Code: (Demo)

echo preg_match('~^[p{Lu}x{2160}-x{216F}]~u', $test) ? 'upper' : 'not upper';
Wednesday, March 31, 2021
 
ranhan
answered 7 Months ago
43

The core PHP string functions all assume 1 character = 1 byte. They have no concept of different encodings. To figure out how many characters are in a UTF-8 string (not how many bytes), use the mb_strlen equivalent and tell it what encoding the string is in:

echo mb_strlen('????', 'UTF-8');
Wednesday, March 31, 2021
 
CMOS
answered 7 Months ago
11

I got it. I just had to set the mbstring extension settings to handle internal strings in UTF-8. Thas extension is standard with my build of PHP 5.3.0.

Wednesday, March 31, 2021
 
Hat
answered 7 Months ago
Hat
55
$current_time = "4:59 pm";
$sunrise = "5:42 am";
$sunset = "6:26 pm";
$date1 = DateTime::createFromFormat('h:i a', $current_time);
$date2 = DateTime::createFromFormat('h:i a', $sunrise);
$date3 = DateTime::createFromFormat('h:i a', $sunset);
if ($date1 > $date2 && $date1 < $date3)
{
   echo 'here';
}

See it in action

Reference

  • DateTime
Thursday, June 3, 2021
 
ericstumper
answered 5 Months ago
27

This only works for PHP>=5.3.0, and isn't 100% reliable, but hey, it's pretty darn close.

// return mime type ala mimetype extension
$finfo = finfo_open(FILEINFO_MIME);

//check to see if the mime-type starts with 'text'
return substr(finfo_file($finfo, $filename), 0, 4) == 'text';

http://us.php.net/manual/en/ref.fileinfo.php

Saturday, July 3, 2021
 
PedroKTFC
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :