Asked  7 Months ago    Answers:  5   Viewed   44 times

"something here ; and there, oh,that's all!"

I want to split it by ; and ,

so after processing should get:

something here

and there

oh

that's all!

 Answers

92
<?php

$pattern = '/[;,]/';

$string = "something here ; and there, oh,that's all!";

echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';

Updated answer to an updated question:

<?php

$pattern = '/[x{ff0c},]/u';

//$string = "something here ; and there, oh,that's all!";
$string = 'hei,nihao?a ';


echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';
Wednesday, March 31, 2021
 
astaykov
answered 7 Months ago
61

There are some problems with your regular expression that the main of them is confusing group constructs with character classes. A pipe | in a character class means a | literally. It doesn't have any special meaning.

What you need is this:

("[^"]*")|[!?.]+s*|R+

This first tries to match a string enclosed in double quotation marks (and captures the content). Then tries to match any punctuation marks from [!?.] set to split on them. Then goes for any kind of newline characters if found.

PHP:

var_dump(preg_split('~("[^"]*")|[!?.]+s*|R+~', <<<STR
hello! how are you? how is life
live life, live free. "isnt it?"
STR
, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));

Output:

array(5) {
  [0]=>
  string(5) "hello"
  [1]=>
  string(11) "how are you"
  [2]=>
  string(11) "how is life"
  [3]=>
  string(20) "live life, live free"
  [4]=>
  string(10) ""isnt it?""
}
Wednesday, March 31, 2021
 
hnkk
answered 7 Months ago
82

You have one small mistake in your regex. Try this:

String[] Res = Text.split("[\p{Punct}\s]+");

[\p{Punct}\s]+ move the + form inside the character class to the outside. Other wise you are splitting also on a + and do not combine split characters in a row.

So I get for this code

String Text = "But I know. For example, the word "can't" should";

String[] Res = Text.split("[\p{Punct}\s]+");
System.out.println(Res.length);
for (String s:Res){
    System.out.println(s);
}

this result

10
But
I
know
For
example
the
word
can
t
should

Which should meet your requirement.

As an alternative you can use

String[] Res = Text.split("\P{L}+");

\P{L} means is not a unicode code point that has the property "Letter"

Thursday, June 24, 2021
 
Chvanikoff
answered 4 Months ago
58

Try this:

import re
re.split(r'[,;]+', 'This,is;a,;string')
> ['This', 'is', 'a', 'string']
Wednesday, August 4, 2021
 
keisar
answered 3 Months ago
14

You could first do a Replace on the string first and then do the split:

newString = Replace(origString, "-", " ")
newArray = Split(newString, " ")
Thursday, August 12, 2021
 
kwhohasamullet
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :