Asked  7 Months ago    Answers:  5   Viewed   26 times

Ok, This is a pretty basic question im sure but im new to PHP and haven't been able to figure it out. The input string is $data im trying to continue to pull and only use the first match. Is the below incorrect? This may not even be the best way to perform the action, im just trying to pull the contents in between two html tags (first set found) and discard the rest of the data. I know there are similar questions, ive read them all, my question is a mix, if theres a better way to do this and how i can define the match as the new input for the rest of the remaining code. If i change $matches to $data2 and use it from there on out it returns errors.

preg_match('/<h2>(.*?)</h2>/s', $data, $matches);

 Answers

51

Using regular expressions is generally a good idea for your problem.

When you look at http://php.net/preg_match you see that $matches will be an array, since there may be more than one match. Try

print_r($matches);

to get an idea of how the result looks, and then pick the right index.

EDIT:

If there is a match, then you can get the text extracted between the parenthesis-group with

print($matches[1]);

If you had more than one parenthesis-group they would be numbered 2, 3 etc. You should also consider the case when there is no match, in which case the array will have the size of 0.

Wednesday, March 31, 2021
 
Packy
answered 7 Months ago
19

Here is a way to do the job:

$in = array(
"%@-H-e-l-l-o-7-9#$%",
"Hi$73",
"????????!",
"!",
"",
"?55?W",
'$abc$$$',
"?????_",
"34.5",
'#_!',
);

foreach($in as $elem) {
    preg_match('/^([^pLpN]*)((?=[pLpN]|$)[^_]*(?<=[pLpN])|^)?([^pLpN]*)$/u', $elem, $m);
    printf("'%15s'%s'%10s't%s'%10s't%s'%10s'%s", "$elem","=> (1): ",$m[1],"(2): ",$m[2], "(3): ",$m[3],"n");

}

Where:

  • pL stands for any letter in any language
  • pN stands for any number in any language

Output:

'%@-H-e-l-l-o-7-9#$%'=> (1): '       %@-'   (2): 'H-e-l-l-o-7-9'    (3): '       #$%'
'          Hi$73'=> (1): '          '   (2): '     Hi$73'   (3): '          '
'????????!'=> (1): '          ' (2): '????????' (3): '         !'
'              !'=> (1): '         !'   (2): '          '   (3): '          '
'               '=> (1): '          '   (2): '          '   (3): '          '
'        ?55?W'=> (1): '          ' (2): '   ?55?W' (3): '          '
'        $abc$$$'=> (1): '         $'   (2): '       abc'   (3): '       $$$'
'    ?????_'=> (1): '          '    (2): '?????'    (3): '         _'
'           34.5'=> (1): '          '   (2): '      34.5'   (3): '          '
'            #_!'=> (1): '       #_!'   (2): '          '   (3): '          '
Wednesday, March 31, 2021
 
hohner
answered 7 Months ago
84

How about:

$sentence = "When it comes time to renew your auto insurance policy, be aware of how your carrier handles renewals";
$searches = array('aware', 'aware of', 'be aware', 'be aware of');
$replaces = array('conscious', 'conscious of', 'remember', 'concentrate on');

function cmp($a, $b) {
    if (strpos($a, $b) !== false) return -1;
    if (strpos($b, $a) !== false) return 1;
    return 0;
}

uasort($searches, 'cmp');
$replaces_new = array();
$i=0;
foreach($searches as $k=>$v) {
    $replaces_new[$i] = $replaces[$k];
    $i++;
}

$res = str_replace($searches, $replaces_new, $sentence);
echo $res;

output:

When it comes time to renew your auto insurance policy, concentrate on how your carrier handles renewals
Saturday, May 29, 2021
 
cusejuice
answered 5 Months ago
85

You could be incredibly specific about it:

var regex = new Regex(@"<span id=""point_total"" class=""tooltip"" oldtitle="".*?"" aria-describedby=""ui-tooltip-0"">(.*?)</span>");

var match = regex.Match(@"<span id=""point_total"" class=""tooltip"" oldtitle=""Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again."" aria-describedby=""ui-tooltip-0"">31</span>");

var result = match.Groups[1].Value;
Saturday, June 26, 2021
 
daniel__
answered 4 Months ago
81

Updated to use a more generic method (see edit history for original answer):

You can extract child elements of the outer div by testing whether they are instances of NavigableString.

from bs4 import BeautifulSoup, NavigableString

html = '''<div id="1">
    <div id="2">
        this is the text i do NOT want
    </div>
    this is the text i want here
</div>'''

soup = BeautifulSoup(html)    
outer = soup.div
inner_text = [element for element in outer if isinstance(element, NavigableString)]

This results in a list of strings contained in the outer div element.

>>> inner_text
[u'n', u'n    this is the text i want heren']
>>> ''.join(inner_text)
u'nn    this is the text i want heren'

For your second example:

html = '''<div id="1">
    this is the text i want here
</div>'''
soup2 = BeautifulSoup(html)    
outer = soup2.div
inner_text = [element for element in outer if isinstance(element, NavigableString)]

>>> inner_text
[u'n    this is the text i want heren']

This will also work for other cases such as the outer div's text element being present before any child tags, between child tags, multiple text elements, or not present at all.

Thursday, August 12, 2021
 
Alex Okrushko
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :