Asked  7 Months ago    Answers:  5   Viewed   64 times

I have a string of HTML that I would like to check to see if there are any links inside of it and, if so, extract them and put them in an array. I can do this in jQuery with the simplicity of its selectors but I cannot find the right methods to use in PHP.

For example, the string may look like this:

<h1>Doctors</h1>
<a title="C - G" href="linkl.html">C - G</a>
<a title="G - K" href="link2.html">G - K</a>
<a title="K - M" href="link3.html">K - M</a>

How (in PHP) can i turn it into an array that looks something like:

[1]=>"link1.html"
[2]=>"link2.html"
[3]=>"link3.html"

Thanks, Ian

 Answers

62

You can use PHPs DOMDocument library to parse XML and/or HTML. Something like the following should do the trick, to get the href attribute from a string of HTML.

$html = '<h1>Doctors</h1>
<a title="C - G" href="linkl.html">C - G</a>
<a title="G - K" href="link2.html">G - K</a>
<a title="K - M" href="link3.html">K - M</a>';

$hrefs = array();

$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
       $hrefs[] =  $tag->getAttribute('href');
}
Wednesday, March 31, 2021
 
nasty
answered 7 Months ago
78

This is very easy to do using SimpleXML:

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Wednesday, March 31, 2021
 
ALH
answered 7 Months ago
ALH
48

The strcspn function is what you are looking for.

<?php

$mask = "abc";

$string = "log dog hat bat";

$result = substr($string,0,strcspn($string,$mask));

var_dump($result);

?>
Saturday, May 29, 2021
 
kwichz
answered 5 Months ago
11

Create an element, store the HTML in it, and get its textContent:

function extractContent(s) {
  var span = document.createElement('span');
  span.innerHTML = s;
  return span.textContent || span.innerText;
};
    
alert(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>"));

Here's a version that allows you to have spaces between nodes, although you'd probably want that for block-level elements only:

function extractContent(s, space) {
  var span= document.createElement('span');
  span.innerHTML= s;
  if(space) {
    var children= span.querySelectorAll('*');
    for(var i = 0 ; i < children.length ; i++) {
      if(children[i].textContent)
        children[i].textContent+= ' ';
      else
        children[i].innerText+= ' ';
    }
  }
  return [span.textContent || span.innerText].toString().replace(/ +/g,' ');
};
    
console.log(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>.  Nice to <em>see</em><strong><em>you!</em></strong>"));

console.log(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>.  Nice to <em>see</em><strong><em>you!</em></strong>",true));
Tuesday, June 29, 2021
 
kensil
answered 4 Months ago
98

Yes, match is the way to go:

var matches = str.match(/(d+)sl(d+)/);
var number1 = Number(matches[1]);
var number2 = Number(matches[2]);
Thursday, August 5, 2021
 
elias
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :