I have a result from a curl request from a page like this:
$result =
<div class="c-wrapper">
<a href="link-to-a-page.php">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE</h4>
</div>
</div>
</a>
<div>
<div class="c-wrapper">
<div class="c-content-img">
<img src="...">
</div>
<div class="c-link-data">
<div class="c-link-data-title">
<h4>TITLE 2</h4>
</div>
</div>
<div>
Now I have to count how many c-wrapper is present:
I use correctly this:
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs-length; //<--- printed: 2
Then I have to print all titles:
I use correctly this:
$titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4");
foreach ($titles as $title) {
echo $title->textContent . "<br>";
}
Now the part I don't know: In the first div is present a link, in the second one no link. I'd like to edit my print of titles like this:
foreach ($titles as $title) {
if ( $link_extracted !="" )
echo "<a href='" . $link_extracted . "'>" . $title->textContent . "</a><br>";
else
echo $title->textContent . "<br>";
}
How can I edit $titles = $xpath->query("//div[contains(#class, 'c-link-data-title')]/h4"); to achieve this?
Rather than doing this in separate stages, the code finds the c-wrapper elements and then further uses XPath to find the various parts you want inside that particular element, so in
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
it is looking for an <a> element relative to the $div element. Using [0] as you want only the first one.
$doc = new DOMDocument();
#$doc->loadHTML($result);
$xpath = new DOMXPath($doc);
$divs = $xpath->query("//div[contains(#class, 'c-wrapper')]");
echo $divs->length;
foreach ( $divs as $div ) {
$link_extracted = $xpath->evaluate("a/#href", $div)[0];
$title = $xpath->evaluate("descendant::div[contains(#class, 'c-link-data-title')]/h4/text()"
, $div)[0];
if ( !empty($link_extracted->nodeValue) ) {
echo "<a href='" . $link_extracted->nodeValue . "'>" . $title->textContent . "</a><br>";
}
else {
echo $title->textContent . "<br>";
}
}
which for your test HTML gives...
2<a href='link-to-a-page.php'>TITLE</a><br>TITLE 2<br>
Related
I'm using a code to extract the items from an ebay rss feed, the only problem is that it is only extracting one item.
I suspected it was because of for each, but after searching this whole site, I couldn't find a solution. The feed URL will output 8 items (entriesPerPage=8), if you access the feed, you'll that the full xml code is there, but the parser is only getting one item.
<?php
$feedurl = "http://rest.ebay.com/epn/v1/find/item.rss?keyword=%28jewelry%2Ccraft%2Cclothing%2Cshoes%2Cdiy%29&sortOrder=BestMatch&programid=1&campaignid=5337945426&toolid=10039&listingType1=All&lgeo=1&topRatedSeller=true&hideDuplicateItems=true&entriesPerPage=8&feedType=rss";
$rss = simplexml_load_file($feedurl);
foreach ($rss->channel->item as $item) {
$link = $item->link;
$title = $item->title;
$description = $item->description;
}
?>
<div class="mainproductebayfloatright-bottom">
<div class="aroundebay">
<?
print "<div class=\"titleebay\">" . $title . "</div>";
print $description;
?>
</div>
</div>
?>
Move your html inside a loop, as currently on each iteration your variables are overwritten and after the loop is over what you have is values of the last xml-item:
foreach ($rss->channel->item as $item) {
$link = $item->link;
$title = $item->title;
$description = $item->description;?>
<div class="mainproductebayfloatright-bottom">
<div class="aroundebay">
<?php
// simple title
print "<div class=\"titleebay\">" . $title . "</div>";
// title-link
print "<a href=\"" . $link . "\">" . $title . "</div>";
print $description;
?>
</div>
</div>
<?php
}
I have the following content:
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
I want to use XPath to pull out "ONE" and "TWO" from there.
The code I have right now is:
$html = file_get_contents($_POST['url']);
$document = new DOMDocument();
$document->loadHTML ($html);
$selector = new DOMXPath($document);
$query = '//div[#class="item"]';
$anchors = $selector->query($query);
foreach ($anchors as $node) {
// print ONE;
// print TWO;
}
Here comes an example:
$html = <<<EOF
<div class="item">
<a href="ONE">
<img src="TWO">
</a>
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$links = $selector->query(
'//div[#class="item"]//#href | //div[#class="item"]//#src'
);
foreach($links as $link) {
echo $link->nodeValue . PHP_EOL;
}
If you want to break it down by <div class="item"> you can use the following code:
foreach($selector->query('//div[#class="item"]') as $div) {
foreach($selector->query('.//#href | .//#src', $div) as $link) {
echo $link->nodeValue . PHP_EOL;
}
}
I am trying to scrape some content using simple_html_dom without luck.
I am trying to grab the title, image path and the link and display it.
The HTML structure is:
<div class="article_item clearfix">
<h2 class="title">My amazing Title</h2>
<p class="date">September 22 2014</p>
<p class="image_left">
<a href="http://www.demodomain/articleid=1">
<img src="http://www.demodomain/photos/cef78533cd5.jpg" alt="My amazing post ">
</a>
</p>
<p>This is a demo description<strong>of this amazing</strong> article</p>
<p class="more">Read more...</p>
</div>
My code so far:
foreach($html->find('article_item') as $article) {
$item['title'] = $article->find('.title, a', 0)->plaintext;
$item['thumb'] = $article->find('.image_left img', 0)->src;
$item['details'] = $article->find('p', 0)->plaintext;
$item['url'] = $article->find('.more, a', 0)->plaintext;
echo 'Title: ' . $item['title'];
echo "</br>";
echo "image url: " . $item['thumb'];
echo "</br>";
echo "Description: " . $item['details'];
echo "</br>";
echo "Read More Url: " . $item['url'];
}
// Clear dom object
$html->clear();
unset($html);
You didn't state whats not working but consider this example:
foreach($html->find('div.article_item') as $div) {
// ^ point to div tag with class name article_item
$title = $div->find('h2.title a ', 0)->innertext;
// ^ target the h2 tag with class title with child anchor
// just same as accessing dom with jquery
$thumb = $div->find('p.image_left img ', 0)->src;
$details = $div->children(3)->plaintext;
// $url = $div->find('p.more', 0)->plaintext;
$url = $div->find('p.more a', 0)->href;
echo $title . '<br/>';
echo $thumb . '<br/>';
echo $details . '<br/>';
echo $url . '<br/>';
}
Basically, this is just the same as selecting selectors.
can you try like this
$item['title'] = $article->find('h2.title')->plaintext;
$item['thumb'] = $article->find('p.image_left')->find('img')->src;
Current Situation :
I'm trying to parse a DomDocument with XPath, the result should be an array with Categories and Subcategories .
The problem is, the person that made the HTML did not structure the info with the subcategories in the main categories, they are just delimited by pure css .
The html loos like this :
<div class="menu_item">Main Category AC</div>
<div class="submenu_div">
<a href="http://www.link.com/313">
<div class="sub_item">
<h3>Sub Categ A</h3>
</div>
</a>
<a href="http://www.link.com/475">
<div class="sub_item">
<h3>Sub Categ B</h3>
</div>
</a>
<a href="http://www.link.com/321">
<div class="sub_item">
<h3>Sub Categ C</h3>
</div>
</a>
</div>
<div class="menu_item">Main Category BC</div>
<div class="submenu_div">
<a href="http://www.link.com/313">
<div class="sub_item">
<h3>Sub Categ X</h3>
</div>
</a>
<a href="http://www.link.com/475">
<div class="sub_item">
<h3>Sub Categ Y</h3>
</div>
</a>
<a href="http://www.link.com/321">
<div class="sub_item">
<h3>Sub Categ Z</h3>
</div>
</a>
</div>
Now, with this php I can extract de categories and subcategories, but it's just a list, I don't know what subcategory is in what category, and I'm stuck .
How can I use Xpath to do extract the main category subcategories and assign a parent to every subcategory ?
$doc = new DomDocument;
#$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//div[#class="menu_item"]|//div[#class="submenu_div"]/a/div/h3') as $e ) {
echo $e->nodeValue, "<br />\n";
}
This is a sketch for a solution using XPath. The outer loop looks for the categories and prints them. It also keeps track of the position of the outer div in variable $i. The inner loop constructs another XPath that selects the $i'th div tag, then goes to the following sibling and finally descends to the subcategory text.
Note that you still have to store this data into an appropriate data structure. I'm not familiar with PHP so I cannot help you a lot there.
$i = 0;
foreach( $xpath->query('//div[#class="menu_item"]/text()') as $category ) {
$i = $i + 1;
echo "Category: " . $category->nodeValue . "\n";
foreach ( $xpath->query('//div[#class="menu_item"][' . $i . ']/following-sibling::div[1][#class="submenu_div"]/a/div/h3/text()') as $subcategory) {
echo " Subcategory: " . $subcategory->nodeValue . "\n";
}
}
Based on the answer above, I have made some modifications to also include a for loop and also get the link :
for ($i = 0; $i <= 25; $i++) {
foreach( $xpath->query('//div[#class="menu_item"]['.$i.']/text()') as $category ) {
echo $i . " Category: " . $category->nodeValue . "<br/>\n";
foreach ( $xpath->query('//div[#class="menu_item"][' . $i . ']/following-sibling::div[1][#class="submenu_div"]/a') as $subcategory) {
echo '-----'. $i . " Subcategory: " . $subcategory->nodeValue . "<br/>\n";
echo '-----'. $i . " Link: " . $subcategory->getAttribute("href") . "<br/>\n";
}
echo "<br/>";
}
}
thanks again Marcus Rickert !
here is my html :
<div id="main">
<div id="child1">
child1
link1
</div>
<div id="child2">
child2
link2
</div>
</div>
I am trying to return (echo in php) child1 and child2 as links
this is part of a HUGE file so I need to loop through it.
this is what I have so far but its not working :
$linkObjs = $html->find('#main');
foreach ($linkObjs as $linkObj) {
$title = trim($linkObj->fildchild()->plaintext);
$link = trim($linkObj->fildchild()->href);
echo '<p class="titro" ><a href="' . $link . '" >' . $title . '</a></p>';
}
Not sure exactly which part of the elements you needed so here's everything dissected.
// Find all divs in #main
foreach ($html -> find('#main div') as $div)
{
// Find plain text in div
foreach ($div -> find('text') as $text)
{
echo $text;
}
// Find <a> tags and href
foreach ($div -> find('a') as $a)
{
echo $a -> href;
}
}