get href value by class name regex and php [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
i need to get href value from class name like that.
some one can give me regex to find this class name and get from there the href ?
thanks alot

You can do this using DomDocument and XPath, see here
$str = '';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($d);
$links = $xpath->query('//a[#class="boldOrange"]');
foreach ($links as $link) {
$url = $link->getAttribute('href');
print $url . PHP_EOL;
}

Related

How to preg_match this in PHP? [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
How may I preg_match this in PHP?
$item[0] = <tr><td rowspan=2>07/07/2016 14:55</td><td>AC MENDES - Mendes/RJ</td><td><font color="000000">Postado depois do horário limite da agência</font></td></tr>
I've tried the code below, but it didn't work...
if(preg_match("#<td rowspan=[2]>(.*)</td><td>(.*)</td><td><FONT COLOR=\"[0-9A-F]{6}\">(.*)</font></td>#", $item[0], $d))
{
echo 'OK';
}
You may use a DOM with XPath to get all TD texts:
$html = <<<DATA
<tr><td rowspan=2>07/07/2016 14:55</td><td>AC MENDES - Mendes/RJ</td><td><font color="000000">Postado depois do horário limite da agência</font></td></tr>;
DATA;
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$tds = $xpath->query('//td');
$res = array();
foreach($tds as $td) {
array_push($res, $td->nodeValue);
}
print_r($res);
See the PHP demo
The //td will get all td nodes. You might also be using '//text()' XPath to just grab all text nodes.
Else, if you know what you are doing, you may add some temporary strings after each <td> node and then strip tags and explode right with the temporary string:
explode("###", strip_tags(str_replace("<td>", "<td>###", $s)))
See this demo

Regular Expression not match href strings [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
I have the following regex:
$regex = '<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>';
how can I improve this to NOT match the word "files" or "resize" in the href tag:
link or
yes parsing is the mutch better way to do this - maybe someone find this helpful:
$inhalt = new DOMDocument;
$inhalt->loadHTML($content->draw()[0][0]);
foreach ($inhalt->getElementsByTagName('a') as $node) {
if ($node->hasAttribute('href')) {
if (preg_match("/(files|resize)/", $node->getAttribute('href')) == 0) {
$node->setAttribute('href', 'mobile.php?uri=http://www.example.com' . str_replace("..", "", $node->getAttribute('href')));
$inhalt->saveHtml($node);
}
}
}
echo $inhalt->saveHtml();
You can use this regex to get all href string:
<a[^>]*href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>

Regular expressions, how to get all that is between tags? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML/XML with PHP?
Grabbing the href attribute of an A element
how can you get all that is between the tags: <td class="detail"></td> ? I use PHP.
Why don't you use DOMXpath?
$xpath = new DOMXPath(yourHtml);
$nodeList = $xpath->query( "//td[#class='detail']" );
foreach($nodeList as $node)
{
echo $node->nodeValue;
}
Parsing HTML with regex isn't a good choice

php: Extract text between specific tags from a webpage [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best methods to parse HTML with PHP
I understand I should be using a html parser like php domdocument (http://docs.php.net/manual/en/domdocument.loadhtml.php) or tagsoup.
How would I use php domdocument to extract text between specific tags, for example get text between h1,h2,h3,p,table? It seems I can only do this for one tag only with getelementbytagname.
Is there a better html parser for such task? Or how would I loop through the php domdocument?
You are correct, use DomDocument (since regex is NOT a good idea for parsing HTML. Why? See here and here for reasons why).
getElementsByTagName gives you a DOMNodeList that you can iterate over to get the text of all the found elements. So, your code could look something like:
$document = new \DOMDocument();
$document->loadHTML($html);
$tags = array ('h1', 'h2', 'h3', 'h4', 'p');
$texts = array ();
foreach($tags as $tag)
{
$elementList = $document->getElementsByTagName($tag);
foreach($elementList as $element)
{
$texts[$element->tagName][] = $element->textContent;
}
}
return $texts;
Note that you should probably have some error handling in there, and you will also lose the context of the texts, but you can probably edit this code as you see fit.
You can doing so with a regex.
preg_match_all('#<h1>([^<]*)</h1>#Usi', $html_string, $matches);
foreach ($matches as $match)
{
// do something with $match
}
I am not sure what is your source so I added a function to get the content via the URL.
$file = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($file);
$body = $doc->getElementsByTagName('body');
$h1 = $body->getElementsByTagName('h1');
I am not sure of this part:
for ($i = 0; $i < $items->length; $i++) {
echo $items->item($i)->nodeValue . "\n";
}
Or:
foreach ($items as $item) {
echo $item->nodeValue . "\n";
}
Here is more info on nodeValue: http://docs.php.net/manual/en/function.domnode-node-value.php
Hope it helps!

extract url using PHP [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Grabbing the href attribute of an A element
hi, i have this string in PHP
<iframe frameborder="0" width="320" height="179" src="http://www.dailymotion.com/embed/video/xinpy5?width=320&wmode=transparent"></iframe><br />Le buzz Pippa Middleton agace la Reine ! <i>par direct8</i>
i would like to extract the url from the anchor href attribute using preg_match or other php functins
Don't use regexes to parse HTML. Use the PHP DOM:
$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your string
//get all anchors
$anchors = $DOM->getElementsByTagName('a');
//display all hrefs
for ($i = 0; $i < $anchors->length; $i++)
echo $anchors->item($i)->getAttribute('href') . "<br />";
You can check if the node has a href using hasAttribute() first if necessary.
You can use
if (preg_match('#<a\s*[^>]*href="([^"]+)"#i', $string, $matches))
echo $matches[0];
try this regex
(?<=href=\")[\w://\.\-]+

Categories