Wired HTML DOM produced by PHP

Wired HTML DOM produced by PHP - php

I'm retrieving rss feed of blogs with this code
<?php
$xml = ("https://serembangirl.wordpress.com/feed/");
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$x=$xmlDoc->getElementsByTagName('item');
for ($i=0; $i<=5; $i++) {
$item_title=$x->item($i)->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
$item_link=$x->item($i)->getElementsByTagName('link')
->item(0)->childNodes->item(0)->nodeValue;
$item_desc=$x->item($i)->getElementsByTagName('description')
->item(0)->childNodes->item(0)->nodeValue;
$item_content=$x->item($i)->getElementsByTagName('encoded')->item(0)->nodeValue;
?>
<a href='#'>
<div class="card">
<div class='inner'>
<p class='title'>
<?php echo $item_title;?>
</p>
<p class='desc'> <?php echo $item_desc; ?> </p>
</div>
</div>
</a>
<?php } ?>
With above code, supposedly the should wrap the but it produced this instead :
http://i.imgur.com/YspeRe3.png
I really scratched my head solving this.

I think div within anchor tag is not recommended.

Check the actual source code that is generated by PHP. It will have the div inside the a.
div, p or other block level elements are not allowed inside an a element. The browser tries to "fix" your document.
Hint 1
Use XPath to fetch data from the DOM.
$xpath = new DOMXPath($xmlDoc);
foreach ($xpath->evaluate('//item') as $item) {
$item_title = $xpath->evaluate('string(title)', $item);
// ...
}
Hint 2
Don't forget the escaping if you output data as HTML source.
...
<p class='title'>
<?php echo htmlspecialchars($item_title); ?>
</p>
...

Related

Replace class content using php

I want to replace string from specific classes from HTML.
In HTML there is other content which I don't want to change.
In below code want to change data on class one and three only, class two content should be as it is.
I need to this in dynamic way.
<div class="one"> I want to change this </div>
<div class="two"> I don't want to change this </div>
<div class="three"> I want to change this </div>

Dom functions are helpful
php manual
//your html file content
$str = '...<div class="one"> I want to change this </div>
<div class="two"> I don\'t want to change this </div>
<div class="three"> I want to change this </div>... ';
$dom = new DOMDocument();
$dom->loadHtml($str);
$domXpath = new DOMXPath($dom);
//query the nodes matched
$list = $domXpath->query('//div[#class!="two"]');
if ($list->length > 0) {
foreach ($list as $node) {
//change node value
$node->nodeValue = 'Content changed!';
}
}
//get the result
$new_str = $dom->saveHTML();
var_dump($new_str);

simple html dom traversal confusion when looping

I'm trying to use the php script simplehtmldom to loop over divs on a web page while scraping.
Right now I have this:
$url = "https://test.com/";
$html = new simple_html_dom();
$html->load_file($url);
$item_list = $html->find('div.main div[id]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}
This will give me many like this (from the echo in the loop above):
<div id=1>
<div>
stuff here
</div>
<div>
<span class="title">name</span>
</div>
</div>
<div id=2>
<div>
stuff here
</div>
<div>
<span class="title">name 2</span>
</div>
</div>
What I'm trying to do is loop over the span with class=title, but no matter what I can't seem to quite get the right selector. Could someone help me out?

You can get the spans adding span[class=title] as a selector:
$item_list = $html->find('div.main div[id] span[class=title]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}

Fetching Image from particular div Only via DOMDocument in PHP

I have website, where i have posted few images inside particular div :-
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
And from my 2nd website, i want to fetch all images on that particular div.. I have below code.
<?php
$htmlget = new DOMDocument();
#$htmlget->loadHtmlFile('http://www.example.com');
$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/#src" );
foreach ($nodelist as $images){
$value = $images->nodeValue;
echo "<img src='".$value."' /><br />";
}
?>
But this is fetching all images from my website and not just particular div. It also prints out my RSS image, Social icon image, etc.,
Can i specify particular div in my php code, so that it only fetch image from div.posts class.

first give a "id" for the outer div container. Then get it by its id. Then get its child image nodes.
an example:
$tables = $dom->getElementsById('node_id');
$table = $tables->item(1);
//get the number of rows in the 2nd table
echo $table->childNodes->length;
//content of each child
foreach($table->childNodes as $child)
{
echo $child->ownerDocument->saveHTML($child);
}
may be this like will help you. It has a good tutorial.
http://www.binarytides.com/php-tutorial-parsing-html-with-domdocument/

With PHP Simple HTML Parser, this will be:
include('simple_html_dom.php');
$html=file_get_html("http://your_web_site.com");
foreach($html->find('div.posts img') as $img_posts){
echo $img_posts->src.<br>; // to show the source attribute
}
Still reading about PHP Simple HTML Dom parser. And so far, it's faster(in implementation) than regex.

Here is another code that may help. You are looking for
doc->getElementsByTagName
which can help target a tag directly.
<?php
$myhtml = <<<EOF
<html>
<body>
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
</body>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$divs = $doc->getElementsByTagName('img');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
?>
Demo here http://codepad.org/keZkC377
Also the answer here can provide further insights
Not finding elements using getElementsByTagName() using DomDocument

get complete 'div' content using class name or id using php

i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>

Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>

Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database

I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div

PHP remove all div with class="myclass" except first one + add another div instead of others

I have a $content with
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div class="myclass">
...
</div>
...
...
<div class="myclass">
...
</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
I would like to remove via PHP all the divs with class="myclass" except the first one, and add another div instead of others, so that the result is:
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div>Check all divs here</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
Would be grateful if someone can point me a solution.
UDATE2:
some similar question here
from that I came up with the following test code:
$content = '<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
</div>
<div class="myclass">
</div>
<div class="myclass">
</div>
</div>
<div class="nav">
</div>
</div>
some other text here, <p></p> bla-bla-bla';
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#class="myClass" and position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
Any ideas where I can test it?

Here is what you are looking for (similar to your edit, but it removes the added html tags):
$doc = new DOMDocument();
$doc->loadHTML($content);
$xp = new DOMXpath($doc);
$elements = $xp->query("//div[#class='myclass']");
if($elements->length > 1)
{
$newElem = $doc->createElement("div");
$newElem->appendChild($doc->createTextNode("Check all divs "));
$newElemLink = $newElem->appendChild($doc->createElement("a"));
$newElemLink->setAttribute("href", "myurl");
$newElemLink->appendChild($doc->createTextNode("here"));
$elements->item(1)->parentNode->replaceChild($newElem, $elements->item(1));
for($i = $elements->length - 1; $i > 1 ; $i--)
{
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
}
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));

$var = ':not(.myClass:eq(1))';
$var.removeClass("myClass");
$var.addClass("some_other_Class");

If I got you right, you've got a string called $content with all that content in it
It's not the best solution I guess but here is my attempt (which works fine for me):
if( substr_count($content, '<div class="myclass') > 1 ) {
$parts = explode('<div class="myclass',$content);
echo '<div class="myclass'.$parts[1];
echo '<div>Check all divs here</div>';
}
else {echo $content;}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Wired HTML DOM produced by PHP - php

I think div within anchor tag is not recommended.

Related

Replace class content using php

simple html dom traversal confusion when looping

Fetching Image from particular div Only via DOMDocument in PHP

get complete 'div' content using class name or id using php

PHP remove all div with class="myclass" except first one + add another div instead of others

Categories

Resources