Asked  7 Months ago    Answers:  5   Viewed   27 times

I know this comment PHP.net. I would like to have a similar tool like tr for PHP such that I can run simply

tr -d " " ""

I run unsuccessfully the function php_strip_whitespace by

$tags_trimmed = php_strip_whitespace($tags);

I run the regex function also unsuccessfully

$tags_trimmed = preg_replace(" ", "", $tags);

 Answers

94

A regular expression does not account for UTF-8 characters by default. The s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines

// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/s+/', '', $str);

With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the s cannot account for.

To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.

Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (x80 in the quad set could replace all x80 sub-bytes in smart quotes)

$cleanedstr = preg_replace(
    "/(t|n|v|f|r| |xC2x85|xc2xa0|xe1xa0x8e|xe2x80[x80-x8D]|xe2x80xa8|xe2x80xa9|xe2x80xaF|xe2x81x9f|xe2x81xa0|xe3x80x80|xefxbbxbf)+/",
    "_",
    $str
);

This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:

nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.

Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"

[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Wednesday, March 31, 2021
 
tdous
answered 7 Months ago
24

There is no need of regex here and you can use rtrim for it, its cleaner and faster:

$str = rtrim($str);

But if you want a regex based solution you can use:

$str = preg_replace('/s*$/','',$str);

The regex used is /s*$/

  • s is short for any white space char, which includes space.
  • * is the quantifier for zero or more
  • $ is the end anchor

Basically we replace trailing whitespace characters with nothing (''), effectively deleting them.

Wednesday, March 31, 2021
 
dirigibleplum
answered 7 Months ago
79

Use preg_match as suggested by Josh:

<?php

$foo = "Dave Smith";
$bar = "SamSpade";
$baz = "DavetttSmith";

var_dump(preg_match('/s/',$foo));
var_dump(preg_match('/s/',$bar));
var_dump(preg_match('/s/',$baz));

Ouputs:

int(1)
int(0)
int(1)
Saturday, July 3, 2021
 
Gersom
answered 4 Months ago
20

you are getting if from request not session.

It should be

session.getAttribute("MyAttribute")

I suggest you to use JavaServer Pages Standard Tag Library or Expression Language instead of Scriplet that is more easy to use and less error prone.

${sessionScope.MyAttribute}

or

<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>

<c:out value="${sessionScope.MyAttribute}" />

you can try ${MyAttribute}, ${sessionScope['MyAttribute']} as well.

Read more

  • Oracle Tutorial - Using JSTL

  • Oracle Tutorial - Expression Language

Friday, July 30, 2021
 
neon29
answered 3 Months ago
87

As a wacky workaround you could filter non-html brackets with:

$html = preg_replace("# <(?![/a-z]) | (?<=s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

Apply strip_tags() afterwards. Note how this only works for your specific example and similar cases. It's a regular expression with some heuristics, not artificial intellegince to discern html tags from unescaped angle brackets with other meaning.

Monday, August 23, 2021
 
j3d
answered 2 Months ago
j3d
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :