How to read the <strong> text and the link url using DOMdocument? - php

I have this html:
<a href=" URL TO KEEP" class="class_to_check">
<strong> TEXT TO KEEP</strong>
</a>
I have a long html code with many link as above, I have to keep the links that have the <strong> inside, I have to keep the HREF of the link and the text inside the <strong>, how can i do using DOMDocument?
Thank you!

$html = "...";
$dom = new DOMDOcument();
$dom->loadHTML($html);
$xp = new XPath($dom);
$a = $xp->query('//a')->item(0);
$href = $a->getAttribute('href');
$strong = $a->nodeValue;
Of course, this XPath stuff works for just this particular html snippet. You'll have to adjust it to work with a more fully populated HTML tree.

Related

appendXML stripping out img element

I need to insert an image with a div element in the middle of an article. The page is generated using PHP from a CRM. I have a routine to count the characters for all the paragraph tags, and insert the HTML after the paragraph that has the 120th character. I am using appendXML and it works, until I try to insert an image element.
When I put the <img> element in, it is stripped out. I understand it is looking for XML, however, I am closing the <img> tag which I understood would help.
Is there a way to use appendXML and not strip out the img elements?
$mcustomHTML = "<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></img></div>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$template = $doc->createDocumentFragment();
$template->appendXML($mcustomHTML);
$tag->appendChild($template);
break;
}
}
return $doc->saveHTML();
This should work for you. It uses a temporary DOM document to convert the HTML string that you have into something workable. Then we import the contents of the temporary document into the main one. Once it's imported we can simply append it like any other node.
<?php
$mcustomHTML = '<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></div>';
$customDoc = new DOMDocument();
$customDoc->loadHTML($mcustomHTML, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc = new DOMDocument();
$doc->loadHTML($content);
$customImport = $doc->importNode($customDoc->documentElement, true);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$tag->appendChild($customImport);
break;
}
}
return $doc->saveHTML();

PHP if string contains an <a> tag

Let's say I submit a form with the message:
Hi! What's up? Click here to check out my website.
How can detect if the string contains <a> tags with PHP, and then add rel="nofollow" to it? So it would change to:
Hi! What's up? Click here to check out my website.
A little speculation of how the code would function?
$string = $_POST['message'];
if (*string contains <a> tags*) {
*add rel="nofollow"*
}
There's always the DOMDocument object.
<?php
$dom = new DOMDocument();
$dom->loadHTML('woo! examples!');
foreach ($dom->getElementsByTagName('a') as $item) {
$item->setAttribute('rel', 'nofollow');
}
echo $dom->saveHTML();
?>

Obtain text between div tags within only certain span IDs with DOMdocument or SimpleDOM

this is my code:-
<?php
$url = file_get_contents("http://www.youtube.com/watch?v=QR8A3T6sPzU");
$doc = new DOMDocument();
#$doc->loadHTML($url);
$xpath = new DOMXPath($doc);
$myNews = $xpath->query('//#id="watch7-views-info"')->item(0);
echo $myNews;
?>
how to get the all text between div tags within only certain span IDs...
thanks
I'd use simpleHtmlDOM (simpleHtmlDOM):
include 'simpledom.php';
$html = file_get_html('http://www.youtube.com/watch?v=QR8A3T6sPzU');
// Find all divs with id watch7-views-info and echo their contents
foreach($html->find('div[id=watch7-views-info]') as $element)
echo $element->plaintext . '<br>';
This would find all divs with the specific id, you mentioned something about spans, but you'll have to elaborate because I don't know what you mean.

PHP dom to get tag class with multiple css class name

I have difficulties to get second link href and Text. How to select class="secondLink SecondClass". Using PHP Dom, Thank you
<td class="pos" >
<a class="firstLink" href="Search/?List=200003000112097&sr=1" >
Firs link value
</a>
<br />
<a class="secondLink SecondClass" href="/Search/?KeyOpt=ALL" >
Second Link Value
</a>
</td
My code is
// parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($html);
/*** discard white space ***/
$dom->preserveWhiteSpace = false;
// grab all the on the page
$xpath = new DOMXPath($dom);
//$hrefs = $xpath->evaluate("/html/body//a[#class='firstLink']");// its working
$hrefs = $xpath->evaluate("/html/body//a[#class='secondLink SecondClass']");// not working
Thank you
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',#class,' '),' secondClass ')
and (contains(concat(' ',#class,' '),' secondLink '))]"
from this answer
you can pick it by selecting your td having class pos and selecting anchor tags. then you cann control your returing array to get your specific anchor tag

How to retrieve all links from HTML document using DOMXPath

I have this code
<?PHP
$content = '<html>
<head>
<title></title>
</head>
<body>
<ul>
<li style="border:0px" class="list" id="list1111">
<a href="http://www.example.com/" style="font-size:10px" class="mylinks">
<img src="logo.gif" width="235" height="97" alt="logo example" border="0"/>
</a>
</li>
<li style="border:0px" class="list" id="list2222">
<a href="http://www.example.com/2222222" class="mylinks">
second link
</a>
</li>
</ul>
</body>
</html> ';
$doc = new DOMDocument;
$doc->loadhtml($content);
$xpath = new DOMXPath($doc);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url ."<br />";
}
?>
this code is very simple it just retrieve all anchor tags from an HTML document
I found it here
what I want is more complex :)
I want to retrieve all anchor tags + all children and parents and their attributes for every anchor tag
for example the result I want is when retrieving the first anchor tag is something like this
1-html
2-body
3-ul
4-li(class:list,id:list1111,style:etc....)
5-a(href:www.example.com etc..)
6-img(width:257 etc)
I want to iterate from the top level to the lowest level for every anchor tag and I want to be able retrieve the attributes for each tag
It is very difficult for me because of "DOMXPath" :( however it might be easy for some of you
do you have any question?
do you know how to solve this problem?
Thanks in advance
XPaths should make it so you don't need to iterate. To pull the important attributes of li use an XPath like:
//li/#class
or
//li/#id
which should give you an iterable object you can use.
Here's some more information on XPaths
Maybe you should write a simple XSLT stylesheet. Match the <a> tag, and then ancestor::* would give all parent nodes, child::* would give you all the children - you would have a lot more power using simple XPath syntax via XSLT.

Categories