I have following html code
<div id="b_changetext" class="FL gL_13 PT15"> <span class="gr_15 uparw_pc"><strong>5.80</strong></span> (+2.28%)</div>
I wanted to extract content (+2.28%)
Tried following code
foreach($html->find('div[n_changetext]') as $e){
echo $e->innertext . '<br>';
echo "wwwwww";
}
On running it does not enter the for loop . ( "wwwwww" is not displayed)
Can anyone please suggest a solution
div[n_changetext] finds elements with an n_changetext attribute (which is not valid in HTML).
To find an element with a given id you must specify that the name of the attribute is id and specify the value.
The value, in your example, starts with a b not an n:
find('div[id=b_changetext]')
Related
I'm trying to start from the <span> element that has text Value when transacted
Then get its parent <div> and get following sibling which is a <div> and from that <div> get the text of the child <span>.
From what I can tell, the code is correct and should echo $1,034.29.
It echos $0.00 instead.
What am I missing here?
php code:
$a = new DOMXPath($doc);
$dep_val_txt = $a->query("//span[contains(text(), 'Value when transacted')]");
$dep_val_nxt_elem = $a->query("parent::div", $dep_val_txt[0]);
$dep_val_elem = $a->query("following-sibling::*[1]", $dep_val_nxt_elem[0]);
$dep_val = $dep_val_elem->item(0)->childNodes->item(0)->nodeValue;
echo $dep_val;
html code:
<div class="sc-8sty72-0 cyLejs">
<span class="sc-1ryi78w-0 bFGdFC sc-16b9dsl-1 iIOvXh sc-1n72lkw-0 bKaZjn" opacity="1">Value when transacted</span>
</div>
<div class="sc-8sty72-0 cyLejs">
<span class="sc-1ryi78w-0 bFGdFC sc-16b9dsl-1 iIOvXh u3ufsr-0 gXDEBk" opacity="1">$1,034.29</span>
</div>
In case someone else stumbles upon this question in the future, I will summarize the solution which was concluded by conversation with OP in the comments:
The issue here is not with the DOM selectors, as observed by the fact that his output is $0.00 even though he is not formatting the value to appear as a currency. This led me to believe that the website being scraped is in fact using placeholder values which are updated on the client side using Javascript. The reason this cannot be resolved with selectors is because the DOM received by PHP will be the initial render, which does not contain the values we wish to scrape.
So the solution is to examine the website being scraped to determine where and how the values are being fetched before being added to the DOM on the client side. For example, if the website is using an API call to fetch the values, one can simply use the same API to fetch the intended data without having to scrape the HTML DOM at all.
If you follow OPs question literally
start from the <span> element that has text "Value when transacted"
get its parent <div>
get following sibling which is a <div>
get the text of the child <span>
then the xpath expression should be
//span[text()='Value when transacted']/parent::div/following-sibling::div/span
You might find it easier and faster to process using a regex to match the price, here's a quick example in PHP:
<?php
// Your input HTML (as per your example)
$inputHtml = <<<HTML
<div class="sc-8sty72-0 cyLejs">
<span class="sc-1ryi78w-0 bFGdFC sc-16b9dsl-1 iIOvXh sc-1n72lkw-0 bKaZjn" opacity="1">Value when transacted</span>
</div>
<div class="sc-8sty72-0 cyLejs">
<span class="sc-1ryi78w-0 bFGdFC sc-16b9dsl-1 iIOvXh u3ufsr-0 gXDEBk" opacity="1">$1,034.29</span>
</div>
HTML;
$matches = [];
// Look for any div > span element which contains a string starting with $ and then match a number (allowing for a , or . within the price matched).
if (preg_match_all('#<div.*>\s*<span.*?>\$([0-9.,]+)</span>\s*</div>#mis', $inputHtml, $matches)) {
echo 'Price found: ' . $matches[1][0] . PHP_EOL;
}
Console output from this:
Price found: 1,034.29
<div class="menutitle" onclick="SwitchMenu('sub38');sorter=new table.sorter('sorter');sorter.init('taboastreams38',4);"><meta name="fe38" itemprop="startDate" content="2017-09-19T18:30"><span class="t">19:30</span> <span class="es" style="display: none;">qqqq</span><span class="en">fff</span> **text here** : <b><span itemprop="name">eee- rrr</span></b></div>
I am using php and xpathfilter I tried many times to get the text without selector but I cant . I can get the data from any other selector but this ( text here) text in this location i cant .
the code that i used
$EventlistNodeValues = $crawler->filterXPath('//div[#class="menutitle"]')->each(function (Crawler $node, $i) {
$event = $node->filterXPath('//'); // i need to change selector here to get text
return json_encode($event,true);
});
//div[#class='menutitle']//b/../text()
after I used the above xpath pattern I am able to select the first previous of the tag which is here the text without selectors and this text is what i need to select .
I think this is a simple question but I can't sort it, I am trying to get all heading tags with the simple php DOM parser, my code works only one way, example
$heading['h2']=$html->find('h2 a');//works fine
I have found some sites wrap the h2 within the a tag like this
<a href='#'><h2> my heading</h2></a>
The problem is trying to get both tags so I can display the link with it. So when I do this
$heading['h2']=$html->find('a h2');
I get the h2 fine but it will not wrap the link tag around it, which of course makes sense, find all h2 tags that are children of a but how do I get the entire parent tag, I hope that makes sense, what I want it to return is
<h2>My Headings</h2>
then I can just print the output with
echo $headings['h2']; //and the link with be there
If the <a href="[..]"> ist just the outer element, you can do it like this:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
echo $h2->parent(), "\n";
}
You could also go up the DOM tree until you reach an <a> tag:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
$a = $h2;
while ($a && $a->tag != "h2") $a = $a->parent();
if (!$a) continue; // no <a> above <h2>
echo $a, "\n";
}
Well my first thought we be to use
$html->find('a');
But I'm guessing you have multiple links on your page. So the correct practice would then be to use an ID (or a class) to identify your link
<h2> my heading</h2>
And then search for that specific ID:
$html->find('a#titleLink');
I don't know what library you're using and what syntax it supports, but I hope you get the idea anyway.
According to docs: $heading['h2']=$html->find('a > h2')->parent(); would return the anchor tag wrapping the h2, but if you have multiple 'a > h2' in the page, the find function will return an array, so try it and/or use foreach.
$info = $html->find('a,h2');
echo '<a href='.$info[0]->href.'>'.$info[1]->innertext.'</a>';
I have a select menu in jquery mobile that I want to show a table from another website when the user selects an option, but the table needs to change when the user selects a different option.
I am using simple HTML dom parser, but I was wondering how to add the value of the selected option on to the url so if the user selects an option with the value of 32, it adds 32 onto the url so that the url used in the PHP code would be 'http://www.generalconvention.org/gc/deputations?diocese_id=32'. How do I do this using PHP?
<?php
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.generalconvention.org/gc/deputations?diocese_id=');
// Find all tables
foreach($html->find('table') as $element)
echo $element;
?>
After you capture the value, concatenate it like this:
<?php
$value = "Your form value";
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.generalconvention.org/gc/deputations?diocese_id=' . $value);
// Find all tables
foreach($html->find('table') as $element)
echo $element;
?>
You can do ajax call:
$(".comboboxClass").change(function(){
$("#divHtml").load('http://www.generalconvention.org/gc/deputations?diocese_id='+
$(this).find('option:selected').val()+' table.deputies'
);
});
In the #divhtml loads the contents of the table .deputies with load .
I am trying to get the text from a specific node's parent. For example:
<td colspan="1" rowspan="1">
<span>
<a class="info" shape="rect"
rel="empLinkData" href="/employee.htm?id=8468524">
Jack Johnson
</a>
</span>
(*)
</td>
I am able to successfully process the anchor tag by using:
$xNodes = $xpath->query('//a[#class="info"][#rel="empLinkData"]');
// $xNodes contains employee ids and names
foreach ($xNodes as $xNode)
{
$sLinktext = #$xNode->firstChild->data;
$sLinkurl = 'http://www.company.com' . $xNode->getAttribute('href');
if ($sLinktext != '' && $sLinkurl != '')
{
echo '<li><a href="' . $sLinkurl . '">' .
$sLinktext . '</a></li>';
}
}
Now, I need to retrieve the text from the <td> tag (in this case, the (*) appearing right after the span tag closes), but I can't seem to refer to it properly.
The xpath for this that seems to make the most sense to me is:
$xNodes = $xpath->query('//a[#class="info"]
[#rel="empLinkData"]/ancestor::*');
but it is retrieving the wrong data from elsewhere nested above this code.
It's not necessary to retreat back up the tree. Instead, directly select the td that contains the relevant element:
//td[descendant::a[#class="info"][#rel="empLinkData"]]/text()
Edit: As #Dimitre rightly pointed out, this selects all text children. Your td has two such nodes: the whitespace-only text node that precedes the span and the text node that follows it. If you only want the second text node, then use:
//td[descendant::a[#class="info"][#rel="empLinkData"]]/text()[2]
Or:
//td[descendant::a[#class="info"][#rel="empLinkData"]]/text()[last()]
As you can see, the resulting expressions are essentially the same, but you do need to target the correct text node (if you want only one). Note also that if the target text is truly in a td then it's safer to target that element type directly (without wildcards). As this is HTML, your actual document almost certainly contains several other elements, including multiple other anchors that you may not want to target.
Sample PHP:
$nodes = $xpath->query(
'//td[descendant::a[#class="info"][#rel="empLinkData"]]/text()[last()]');
echo "[". $nodes->item(0)->nodeValue . "]";
Deepest td ancestor:
//a[#class="info"][#rel="empLinkData"]/ancestor::td[1]
Use:
//*[a[#class="info"][#rel="empLinkData"]]/following-sibling::text()[1]
This selects a single text node -- exactly the wanted one.
Do note that an XPath expression like:
//td[descendant::a[#class="info"][#rel="empLinkData"]]/text()
selects more than one text nodes -- not only the wanted text node.