How can I get the "85 mph" from this html code with PHP + Regex ?
I couldn't come up with right regex
This is the code
http://pastebin.com/ffRH9K9Q
<td align="left">Los Angeles</td>
</tr>
<tr>
<td align="left">Wind Speed:</td>
<td align="left">85 mph</td>
</tr>
<tr>
<td align="left">Snow Load:</td>
<td align="left">0 psf</td>
(simplified example)
You've heard already about not using regex for the job, so I won't talk about that.
Let's try something here. Perhaps not the ideal solution, but could work for you.
<?php
$data = 'your table';
preg_match ('|<td align="left">(.*)mph</td>|Usi', $data, $result);
print_r($result); // Your result shoud be in here
You could need some trimming or taking whitespaces into account in the regex.
The first comment that links to the post about NOT PARSING HTML WITH REGEX is important. That said, try something like DOMDocument::loadHTML instead. That should get you started traversing the DOM with PHP.
To expand on DorkRawk's suggestion (in the hope of providing a relatively succinct answer that isn't overwhelming for a beginner), try this:
<?php
$yourhtml = '<td align="left">Los Angeles</td>
</tr>
<tr>
<td align="left">Wind Speed:</td>
<td align="left">85 mph</td>
</tr>
<tr>
<td align="left">Snow Load:</td>
<td align="left">0 psf</td>';
$dom = new DOMDocument();
$dom->loadHTML($yourhtml);
$xpath = new DOMXPath($dom);
$matches = $xpath->query('//td[.="Wind Speed:"]/following-sibling::td');
foreach($matches as $match) {
echo $match->nodeValue."\n\n";
}
Related
How can I get the value of this text.
Idea:
Year: 2012
KM: 69.000
Color: Blue
Price: 29.9000
preg_match('#</div></td><td
class=\"searchResultsAttributeValue\">(.*?)<\/td>#si',$string,$val);
$string = "<div class="classifiedSubtitle">Opel > Astra > 1.4 T Sport</div>
</td>
<td class="searchResultsAttributeValue">
2012</td>
<td class="searchResultsAttributeValue">
69.000</td>
<td class="searchResultsAttributeValue">
Blue</td>
<td class="searchResultsPriceValue">
<div> $ 29.900 </div></td>
<td class="searchResultsDateValue">
<span>21 Nov</span>
<br/>
<span>2016</span>
</td>
<td class="searchResultsLocationValue">
USA<br/>Texas</td>"
The best solution isn't with regex. You should do it with Dom.
$dom = new DOMDocument();
$dom->loadHTML($string);
$xPath = new DOMXpath($dom);
$tdValue = $xPath->query('//td[#class="searchResultsAttributeValue"]')->get(0)->nodeValue;
This way you'll get the td element with the class searchResultsAttributeValue. Of course you should verify if this element really exists, and some other verifications but that's the way.
Hope I was helpful.
I am really struggling attempting to scrape a table either via XPath or any sort of 'getElement' method. I have searched around and attempted various different approaches to solve my problem below but have come up short and really appreciate any help.
First, the HTML portion I am trying to scrape is the 2nd table on the document and looks like:
<table class="table2" border="1" cellspacing="0" cellpadding="3">
<tbody>
<tr><th colspan="8" align="left">Status Information</th></tr>
<tr><th align="left">Status</th><th align="left">Type</th><th align="left">Address</th><th align="left">LP</th><th align="left">Agent Info</th><th align="left">Agent Email</th><th align="left">Phone</th><th align="center">Email Tmplt</th></tr>
<tr></tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
...etc
With additional trs continuing containing 8 tds with the same information as detailed above.
What I need to do is iterate through the trs and internal tds to pick up each piece of information (inside the td) for each entry (inside of the tr).
Here is the code I have been struggling with:
<?php
$payload = array(
'http'=>array(
'method'=>"POST",
'content'=>'key=value'
)
);
stream_context_set_default($payload);
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('website-scraping-from.com');
libxml_clear_errors();
foreach ($dom->getElementsByTagName('tr') as $row){
foreach($dom->$row->getElementsByTagName('td') as $node){
echo $node->textContent . "<br/>";
}
}
?>
This code is not returning nearly what I need and I am having a lot of trouble trying to figure out how to fix it, perhaps XPath is a better route to go to find the table / information I need, but I have come up empty with that method as well. Any information is much appreciated.
If it matters, my end goal is to be able to take the table data and dump it into a database if the first td has a value of "Active".
Can this be of any help?
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
foreach($cells as $node){
echo $node->nodeValue . "<br/>";
}
}
}
This will fetch the second table, and display the contents of the rows starting with a first cell "Active".
Edit: Here is a more extensive help:
$arr = array();
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
$obj = new stdClass;
$obj->type = $cells->item(1)->nodeValue;
$obj->address = $cells->item(2)->nodeValue;
$obj->price = $cells->item(3)->nodeValue;
$obj->agent = $cells->item(4)->nodeValue;
$obj->email = $cells->item(5)->nodeValue;
$obj->phone = $cells->item(6)->nodeValue;
array_push( $arr, $obj );
}
}
print_r( $arr );
As you may know, Joomla components enable you to override their output by copying their template files into your site template. Joomla components generally use helper files which cannot be overridden.
I have a helper.php file that includes the string:
$specific_fields_text = '<tr><td class="key">'.$specific_field_title.': </td><td class="kr_sidecol_subaddress">'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</td></tr>';
In my template override is the code:
<table border="0" cellpadding="2" cellspacing="0">
<?php echo koparentHTML::getHTMLSpecificFields($this->specific_fields); ?>
</table>
The output is as follows:
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td class="key">title</td>
<td class="kr_sidecol_subaddress">value</td>
</tr>
<tr>
<td class="key">title</td>
<td class="kr_sidecol_subaddress">value</td>
</tr>
//.....etc......//
</table>
Basically I want to get rid of the table and turn it into a definition list but I cannot modify the helper.php file. I am thinking that the answer is to do with str_replace
I have tried using:
<dl>
<?php
$spec_fields = koparentHTML::getHTMLSpecificFields($this->specific_fields);
$spec_fields_dl = str_replace("<tr><td class='key'>'.$specific_field_title.': </td><td class='kr_sidecol_subaddress'>'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</td></tr>'", "<dt class='key'>'.$specific_field_title.': </dt><dd class='kr_sidecol_subaddress'>'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</dd>'", $spec_fields);
echo $spec_fields_dl;
?>
</dl>
This returns all of the text but with no html tags (no tr, td, dt, etc).
You can easily parse table data with PHP, like in this example:
$doc = new DOMDocument();
$doc->loadHTML(koparentHTML::getHTMLSpecificFields($this->specific_fields));
$rows = $doc->getElementsByTagName('tr');
$data = array();
for ($i = 0; $i < $rows->length; $i++) {
$cols = $rows->item($i)->getElementsbyTagName("td");
$data[$cols->item(0)->nodeValue] = $data[$cols->item(1)->nodeValue];
}
var_dump $data;
This should convert your table into assoc array ('title' => 'value').
I hope it helps.
I have figured this out. For some reason the PHP bits such as '.$specific_field_title.' where stopping the str_replace from working. To get around this I just searched for the HTML elements and put them in an array like so:
echo str_replace(array('<tr><td class="key">', '</td><td class="kr_sidecol_subaddress">', '</td></tr>'),
array('<dt class="key">', '</dt><dd class="kr_sidecol_subaddress">', '</dd>'),
koparentHTML::getHTMLSpecificFields($this->specific_fields));
And now this works perfectly. Thank you to everyone who contributed.
this code is in an external url: www.example.com.
</head><body><div id="cotizaciones"><h1>Cotizaciones</h1><table cellpadding="3" cellspacing="0" class="tablamonedas">
<tr style="height:19px"><td class="1"><img src="../mvd/usa.png" width="24" height="24" /></td>
<td class="2">19.50</td>
<td class="3">20.20</td>
<td class="4"><img src="../mvd/Bra.png" width="24" height="24" /></td>
<td class="5">9.00</td>
<td class="6">10.50</td>
</tr><tr style="height:16px" valign="bottom"><td class="15"><img src="../mvd/Arg.png" width="24" height="24" /></td>
<td class="2">2.70</td>
<td class="3">3.70</td>
<td class="4"><img src="../mvd/Eur.png" width="24" height="24" /></td>
<td class="5">24.40</td>
<td class="6">26.10</td>
</tr></table>
i want to get the values of the td, any suggestions? php,jquery etc.
You won't be able to do this with javascript, due to security restrictions that only allow you to load data from your own site.
You will have to pull the content with php (using something as simple as file_get_contents) and then parse it.
For the parsing, take a read through this comprehensive post:
How do you parse and process HTML/XML in PHP?
DOM is likely going to be your best bet.
Try playing around with this:
$html = file_get_contents('/path/to/remote/page/');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('td') as $node) {
echo "Full TD html: " . $dom->saveHtml($node) . "\n";
echo "TD contents: " . $node->nodeValue . "\n\n";
}
Its not possible to do with jquery, however you can easily do it with PHP.
Use file_get_contents to read entire source code of the page into a string.
Parse, tokenise the string that contains the entire page source in order to grab all the td value.
<?php
$srccode = file_get_contents('http://www.example.com/');
/*$src is a string that contains source code of web-page http://www.example.com/
/*Now only thing you have to do is write a function say "parser" that tokenise or parse the string in order to grab all the td value*/
$output=parser($srccode);
echo $output;
?>
You have to be very careful while parsing the string to get desired output.For parsing you can either use regular expression or create your own look up table.You can use a HTML DOM parser written in PHP5 that let you manipulate HTML in a very easy way.A lot of such free parsers are available.
my first time here.
I got these lines as a response from the server and saved them in a file. They look like XML, right? My task is to read the content of those td tags and put them into other structured file(Excel). The problem is I dont know how to do that.
At the moment, I think I will strip the first and last line of the file then parse them into XML. But do you know other ways ? Thanks.
<CallbackContent><![CDATA[
<table cellspacing="0" border="0" cellpadding="0" width="100%">
<tr class="rowcolor2">
<td align="left" style="padding:5px;">22/02/2010</td>
<td align="right" style="padding:5px;">510,02</td>
</tr>
</table>
]]></CallbackContent>
Btw, I'm using PHP.
Use an XML parser such as SimpleXML. It will allow you to extract the CDATA safely.
Then if the HTML is XML-compliant (in other words, it's XHTML) you can use SimpleXML to extract data from it. For example:
$xml='<CallbackContent><![CDATA[
<table cellspacing="0" border="0" cellpadding="0" width="100%">
<tr class="rowcolor2">
<td align="left" style="padding:5px;">22/02/2010</td>
<td align="right" style="padding:5px;">510,02</td>
</tr>
</table>
]]></CallbackContent>';
$CallbackContent = simplexml_load_string($xml);
$html = (string) $CallbackContent;
// if XHTML
$table = simplexml_load_string($html);
// otherwise, use
$dom = new DOMDocument;
$dom->loadHTML($html);
$table = simplexml_import_dom($dom)->body->table;
foreach ($table->tr as $tr)
{
echo 'tr class=', $tr['class'], "\n";
foreach ($tr->td as $td)
{
echo 'td align=', $td['align'], ' - value: ', (string) $td, "\n";
}
}
You cannot read the table with an XML parser, because it is pushed out as a CDATA block, which equivocates to a string literal.
First, read the whole thing using a XML parser so that you can pull out the contents of the CDATA section. Then take that and stuff it through an HTML parser.