Asked  7 Months ago    Answers:  5   Viewed   44 times

Possible Duplicate:
PHP String Manipulation: Extract hrefs

I am using php and have string with content =

<a href="www.something.com">Click here</a>

I need to get rid of everything except "www.something.com" I assume this can be done with regular expressions. Any help is appreciated! Thank you

 Answers

78

This is very easy to do using SimpleXML:

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Wednesday, March 31, 2021
 
ALH
answered 7 Months ago
ALH
62

You can use PHPs DOMDocument library to parse XML and/or HTML. Something like the following should do the trick, to get the href attribute from a string of HTML.

$html = '<h1>Doctors</h1>
<a title="C - G" href="linkl.html">C - G</a>
<a title="G - K" href="link2.html">G - K</a>
<a title="K - M" href="link3.html">K - M</a>';

$hrefs = array();

$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
       $hrefs[] =  $tag->getAttribute('href');
}
Wednesday, March 31, 2021
 
nasty
answered 7 Months ago
61

This is the simples and cleanest way:

$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);

$site_url = $matches[1];

EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!

EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:

$str = preg_replace('#(A|[^=]'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>s]+)#i', '\1<a href="\2">\3</a>', $str);
Wednesday, March 31, 2021
 
Corsair
answered 7 Months ago
73

You can use a regular expression as such, it should match exactly your specification:

$string = 'make6to12';
preg_match('{^.*?(?P<before>d{1,2})to(?P<after>d{1,2})}m', $string, $match);
echo $match['before'].', '.$match['after']; // 6, 12
Wednesday, March 31, 2021
 
mgraph
answered 7 Months ago
92

download java file as plain text/html pass it through Jsoup or html cleaner both are similar and can be used to parse even malformed html 4.0 syntax and then you can use the popular HTML DOM parsing methods like getElementsByName("a") or in jsoup its even cool you can simply use

File input = new File("/tmp/input.html");
 Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");
// img with src ending .png

Element masthead = doc.select("div.masthead").first();

and find all links and then get the detials using

String linkhref=links.attr("href");

Taken from http://jsoup.org/cookbook/extracting-data/selector-syntax

The selectors have same syntax as jQuery if you know jQuery function chaining then you will certainly love it.

EDIT: In case you want more tutorials, you can try out this one made by mkyong.

http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/

Wednesday, June 23, 2021
 
twk
answered 4 Months ago
twk
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :