php preg_match with get between two div value

php preg_match with get between two div value - php

How can I get the value of this text.
Idea:
Year: 2012
KM: 69.000
Color: Blue
Price: 29.9000
preg_match('#</div></td><td
class=\"searchResultsAttributeValue\">(.*?)<\/td>#si',$string,$val);
$string = "<div class="classifiedSubtitle">Opel > Astra > 1.4 T Sport</div>
</td>
<td class="searchResultsAttributeValue">
2012</td>
<td class="searchResultsAttributeValue">
69.000</td>
<td class="searchResultsAttributeValue">
Blue</td>
<td class="searchResultsPriceValue">
<div> $ 29.900 </div></td>
<td class="searchResultsDateValue">
<span>21 Nov</span>
<br/>
<span>2016</span>
</td>
<td class="searchResultsLocationValue">
USA<br/>Texas</td>"

The best solution isn't with regex. You should do it with Dom.
$dom = new DOMDocument();
$dom->loadHTML($string);
$xPath = new DOMXpath($dom);
$tdValue = $xPath->query('//td[#class="searchResultsAttributeValue"]')->get(0)->nodeValue;
This way you'll get the td element with the class searchResultsAttributeValue. Of course you should verify if this element really exists, and some other verifications but that's the way.
Hope I was helpful.

Related

HTML parsing with php

Can anyone help in parsing this part of an HTML site? I use php and PHP:DOM
I would like to get the Klassifikation and Schlagwörter in one php string.
How is this done?
Thanks
<tr style="display:table-row;">
<td id="TREFWOORD" class="onOffLink"></td>
<td class="rec_lable"><div>
<span>Schlagwörter</span><span>: </span>
</div></td>
<td class="rec_title"><div>
<span>*</span><span><a class="
link_gen
" href="MAT=/NOMAT=T/REL?PPN=106189719">Recht</a></span><span>
</span><span><a href="http://"
target=""><img src="http://"
alt="Subject" title="Subject" class="img_link"></a></span><span> / </span>
<span><a class="
link_gen
" href="MAT=/NOMAT=T/CMD?
ACT=SRCHA&IKT=5040&TRM=Wo%CC%88rterbuch">Wörterbuch</a></span>
</div></td>
</tr>
<tr style="display:table-row;">
<td></td>
<td class="rec_lable"><div><span>Klassifikation: </span></div></td>
<td class="rec_title"><div>
<span>Basisklassifikation: </span><span><a class="
link_gen
" target=""><img
src="http://" alt="Subject"
title="Subject" class="img_link"></a></span>
</div></td>
</tr>
I tried this without success:
<?php
$url='http://...'
$easycurlcmd=sprintf("curl '%s' -o ./libbvhtml.txt", $url);
printf("Execute: CURL1 ".$easycurlcmd."\n");
exec($easycurlcmd);
$html=file_get_contents('./libbvhtml.txt');
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$rec_lable = $xpath->query("//tr/*[contains(#class, rec_lable')]/div/span[1]");
echo $rec_lable->item(0)->nodeValue; // SchlagwÃ¶rter
echo $rec_lable->item(1)->nodeValue; // Klassifikation
The reason was that curl must be defined with the redirect option.
Thanks to all.

You need to use DOMDocument::loadHTML to parsing HTML and use DOMXPath::query to searching in DOM.
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$rec_lable = $xpath->query("//tr/*[contains(#class, 'rec_lable')]/div/span[1]");
echo $rec_lable->item(0)->nodeValue; // SchlagwÃ¶rter
echo $rec_lable->item(1)->nodeValue; // Klassifikation
Check result in demo

extract attritube value of td tag(php)

<tr>
<td>New order info</td>
<td class="emailid"><input type="button" class="product product-info" value="View product" onclick="popupWindow('viewproduct.php?id=481244','emlmsg',650,400)" /></td>
</tr>
<tr
i want to get the id number in the td tag preceded by 'New order info'. above is an excerpt of the html code.
i tried to do this using both regex and domdocument but cann't get the desired result. i'm thinking about getting all td tags elements using DocDocument's getElementsByTagName method, and if the td text Value is 'New order info',get the attributes in the next td tag.but i'm not sure how to do this or this is the right way.i tried nextSibling but not working in this case. are there any way to get the attributes value in the next td tag?
$DOMNodelist = $doc->getElementsByTagName('td');
foreach($DOMNodelist as $DOMElements) {
if ($DOMElements->nodeValue == "New order info") {
...................
}
}
Thank you very much!

Use XPath here:
$html = <<<EOF
<tr>
<td>New order info</td>
<td class="emailid"><input type="button" class="product product-info" value="View product" onclick="popupWindow('viewproduct.php?id=481244','emlmsg',650,400)" /></td>
</tr>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$td = $selector->query('//td[text() = "New order info"]/following-sibling::td')->item(0);
var_dump($td);
The example above selects the <td> node preceded by 'New order info'. However, the td tag has no id attribute.

How to extract hyperlink using php

I have searched online and thought this would work but it doesn't for some reason. I'm trying to extract a hyperlink that only displays it's URL from a HTML. I'm only trying to extract the URL within the td align="center". Here is a sample of the HTML doc I'm trying to extract:
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal ruby
</td>
**<td align="center">**
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
And here is my PHP code to extract it from the td align="center":
<?php
//$searchURL = "site";
include 'simple_html_dom.php';
$site = 'website';
$html = file_get_html($site);
$tabledata = array();
// Find all TD tags with "align=center"
foreach($html->find('td[align=center]') as $e)
echo $e->href . '<br>';
?>
I know the code works because the code can extract everything if it is just the td within the barracks.

So you have identified the <td> elements themselves, but you did not go down to the next nesting level to grab the href from the <a> elements. You might do that like this:
foreach($html->find('td[align=center]') as $e)
echo $e->children(0)->href . '<br>';

Use the DOM and Xpath:
Select all td elements in the document
//td
Only if the align attribute equals "center"
//td[#align="center"]
Get the a sub elements
//td[#align="center"]//a
Get the href attribute nodes of that a elements
//td[#align="center"]//a/#href
Source example:
$html = <<<'HTML'
<td>
FT
</td>
<td align="right">
Arsenal ruby
</td>
**<td align="center">**
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$nodes = $xpath->evaluate('//td[#align="center"]//a/#href');
foreach ($nodes as $node) {
var_dump($node->value);
}

You selected the td element. The anchor element is the child of the td element.
// Find all TD tags with "align=center"
foreach($html->find('td[align=center]') as $e)
echo $e->firstChild()->getAttribute('href') . '<br>';

DOMDocument Query

Hi I am very new to this World of DOMDocument,Im still learning and looking for xpath query use in DOMDocument.The html sometimes changes so a preg_match is not a good idea. .I need to get the values from a html file.This is the part of html i want to get. I would be happy if you could help me..
<?php
$doc = new DOMDocument();
#$doc->loadHTML('<table cellspacing="0" cellpadding="0" align="center" class="results">
<tr class="header" bgcolor="#0000FF">
<td>
</td>
<td>Name/AKAs</td>
<td>Age</td>
<td>Location</td>
<td>Possible Relatives</td>
</tr>
<tr>
<td>1.</td>
<td>
<a class="LN" href=""><b>Iron, Man E</b></a>
</td>
<td align="center">54</td>
<td>
Canada, AK<br />
California, AK<br />
</td>
<td>
</td>
<td>
View Details
</td>
</tr>
<tr><td>2.</td>
<td>
<a class="LN" href=""><b>Bat, Man E</b></a></td>
<td align="center">26</td>
<td>
Gotham, IA
<br /></td>
<td>
View Details</td></tr>
</table>');
$xpath = new DOMXPath($doc);
$xquery = '//a[#class="LN"]';
$links = $xpath->query($xquery);
foreach ($links as $el) {
echo strip_tags($doc->saveHTML($el)).'<br/>';
}
?>
How do I get the following value? I can only get Iron, Man E, and Bat, Man E
Iron, Man E | 54 | Canada, AK;California, AK
Bat, Man E | 26 | Gotham, IA

My Answer is not about DomDocument Query but can solve your problem easily.
There is a Library named SIMPLEHTMLDOM ! You can do great things with it.
Example :
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
Full Documentation (Power of this Lib) is Here.

Try this,
$xquery = '//a'; // you will get all anchor tags now
$links = $xpath->query($xquery);
foreach ($links as $el) {
echo strip_tags($doc->saveHTML($el)).'<br/>';
}
Try this to get in a single line,
$xpath = new DOMXPath($doc);
$xquery = '//tr[td[a]]';
$links = $xpath->query($xquery);
foreach ($links as $el) {
echo strip_tags($doc->saveHTML($el)).'<br/>';
}

DOMXPath Query for a dynamic HTML

Suppose that i have this HTML from a source (scrapping it) :
<tr class="calendar_row" data-eventid="41675">
<td class="alt2 eventDate smallfont" align="center"/>
<td class="alt2 smallfont" align="center">9:00pm</td>
<td class="alt2 smallfont" align="center">AUD</td>
<td class="alt2 icon smallfont" align="center">
<div class="cal_imp_medium" title="Medium Impact Expected"/>
</td>
<td class="alt2 eventHigh smallfont" align="center">
<div class="calendar_detail level_1" data-level="1" title="Open Detail"/>
</td>
//I want to get this part below correctly
<td class="alt2 pad_left eventHigh smallfont" align="center">0.2%</td>
<td class="alt2 pad_left eventHigh smallfont" align="center"/>
<td class="alt2 pad_left eventHigh smallfont" align="center">
<span class="revised worse" title="Revised From -0.3%">-0.4%</span>
</td>
</tr>
And I want to get the value (nodeValues) of the td's through XPath :
$query = $xpath->query('//tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]');
I cant figure it out why im only getting the value -0.4%.
Though the html seems to be complicated and regradless of how it is being formatted, is there any possible way (query) to retrieve the values in between tags including the null ones on the second td?
Full Code
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query_results = $xpath->query('//tr[#data-eventid="'.$data_eventid.'"]/td[#class="alt2 pad_left eventHigh smallfont"]');
foreach($query_results as $values){
if($values->nodeValue!=' ' and $values->nodeValue!='' and $values->nodeName!='#text') { //Discards Empty Arrays
$table_values[$data_eventid][5] = $values->nodeValue;
}
}

Try this: //tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]/descendant-or-self::*/text()
Well you probably just want the nodes, so take the /text() off:
//tr[#data-eventid="41675"]/td[#class="alt2 pad_left eventHigh smallfont"]/descendant-or-self::*

Your XPath matches three td elements, the first contains 0.2%, then there is an empty one, and the last one contains <span class="revised worse" title="Revised From -0.3%">-0.4%</span>.
You assign in sequence the values of these nodes (skipping the empty ones) to the same variable table_values[$data_eventid][5] - that so will contain the value of the last (non-empty) node - i.e. -0.4%
If you want the values of all the nodes you should append them to a list, or place them in different elements of an array.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php preg_match with get between two div value - php

Related

HTML parsing with php

extract attritube value of td tag(php)

How to extract hyperlink using php

DOMDocument Query

DOMXPath Query for a dynamic HTML

Categories

Resources