Getting PHP str_replace to work with Joomla - php

As you may know, Joomla components enable you to override their output by copying their template files into your site template. Joomla components generally use helper files which cannot be overridden.
I have a helper.php file that includes the string:
$specific_fields_text = '<tr><td class="key">'.$specific_field_title.': </td><td class="kr_sidecol_subaddress">'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</td></tr>';
In my template override is the code:
<table border="0" cellpadding="2" cellspacing="0">
<?php echo koparentHTML::getHTMLSpecificFields($this->specific_fields); ?>
</table>
The output is as follows:
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td class="key">title</td>
<td class="kr_sidecol_subaddress">value</td>
</tr>
<tr>
<td class="key">title</td>
<td class="kr_sidecol_subaddress">value</td>
</tr>
//.....etc......//
</table>
Basically I want to get rid of the table and turn it into a definition list but I cannot modify the helper.php file. I am thinking that the answer is to do with str_replace
I have tried using:
<dl>
<?php
$spec_fields = koparentHTML::getHTMLSpecificFields($this->specific_fields);
$spec_fields_dl = str_replace("<tr><td class='key'>'.$specific_field_title.': </td><td class='kr_sidecol_subaddress'>'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</td></tr>'", "<dt class='key'>'.$specific_field_title.': </dt><dd class='kr_sidecol_subaddress'>'.$specific_fields[$i]->text.' '.$specific_fields[$i]->description.'</dd>'", $spec_fields);
echo $spec_fields_dl;
?>
</dl>
This returns all of the text but with no html tags (no tr, td, dt, etc).

You can easily parse table data with PHP, like in this example:
$doc = new DOMDocument();
$doc->loadHTML(koparentHTML::getHTMLSpecificFields($this->specific_fields));
$rows = $doc->getElementsByTagName('tr');
$data = array();
for ($i = 0; $i < $rows->length; $i++) {
$cols = $rows->item($i)->getElementsbyTagName("td");
$data[$cols->item(0)->nodeValue] = $data[$cols->item(1)->nodeValue];
}
var_dump $data;
This should convert your table into assoc array ('title' => 'value').
I hope it helps.

I have figured this out. For some reason the PHP bits such as '.$specific_field_title.' where stopping the str_replace from working. To get around this I just searched for the HTML elements and put them in an array like so:
echo str_replace(array('<tr><td class="key">', '</td><td class="kr_sidecol_subaddress">', '</td></tr>'),
array('<dt class="key">', '</dt><dd class="kr_sidecol_subaddress">', '</dd>'),
koparentHTML::getHTMLSpecificFields($this->specific_fields));
And now this works perfectly. Thank you to everyone who contributed.

Related

PHP Dom Document - Using Glob and get and specific element and class in every file of a directory

I'm using Glob function in order to get every .htm file and then I trying to get text from a specific table where class = 'DataGrid_Item', by using PHP's DOM element with following HTML (same structure) and following code:
1. HTML
<div>
<table rules="all" id="GridViewAfiliacion" style="border-collapse:collapse;" border="1" cellspacing="0">
<tbody>
<tr class="DataGrid_Header" style="background-color:#98B676;">
<th scope="col">ESTADO</th>
<th scope="col">ENTIDAD</th>
<th scope="col">REGIMEN</th>
<th scope="col">FECHA DE AFILIACION ENTIDAD</th>
<th scope="col">TIPO DE AFILIADO</th>
</tr>
<tr class="DataGrid_Item" align="center">
<td>ACTIVO</td>
<td>NUEVA EPS S.A.</td>
<td>CONTRIBUTIVO</td>
<td>01/06/2016</td>
<td>COTIZANTE</td>
</tr>
</tbody>
</table>
2. PHP
// Directory of Files
$directory = "../fosyga/archivoshtml/";
$array_filename = glob($directory . "*.htm");
foreach($array_filename as $filename)
{
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($filename);
$content_node = $dom->getElementById("GridViewAfiliacion");
// Get the HTML as a string
$string = $content_node > C14N();
}
It's possyble to extract class="DataGrid_Item" info into a string?
Pd: I think the glob function does not work properly in this case, I'm not using that in a correct way.

Scrape DOMDocument Table for Contents in PHP

I am really struggling attempting to scrape a table either via XPath or any sort of 'getElement' method. I have searched around and attempted various different approaches to solve my problem below but have come up short and really appreciate any help.
First, the HTML portion I am trying to scrape is the 2nd table on the document and looks like:
<table class="table2" border="1" cellspacing="0" cellpadding="3">
<tbody>
<tr><th colspan="8" align="left">Status Information</th></tr>
<tr><th align="left">Status</th><th align="left">Type</th><th align="left">Address</th><th align="left">LP</th><th align="left">Agent Info</th><th align="left">Agent Email</th><th align="left">Phone</th><th align="center">Email Tmplt</th></tr>
<tr></tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center"> </td>
</tr>
...etc
With additional trs continuing containing 8 tds with the same information as detailed above.
What I need to do is iterate through the trs and internal tds to pick up each piece of information (inside the td) for each entry (inside of the tr).
Here is the code I have been struggling with:
<?php
$payload = array(
'http'=>array(
'method'=>"POST",
'content'=>'key=value'
)
);
stream_context_set_default($payload);
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('website-scraping-from.com');
libxml_clear_errors();
foreach ($dom->getElementsByTagName('tr') as $row){
foreach($dom->$row->getElementsByTagName('td') as $node){
echo $node->textContent . "<br/>";
}
}
?>
This code is not returning nearly what I need and I am having a lot of trouble trying to figure out how to fix it, perhaps XPath is a better route to go to find the table / information I need, but I have come up empty with that method as well. Any information is much appreciated.
If it matters, my end goal is to be able to take the table data and dump it into a database if the first td has a value of "Active".
Can this be of any help?
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
foreach($cells as $node){
echo $node->nodeValue . "<br/>";
}
}
}
This will fetch the second table, and display the contents of the rows starting with a first cell "Active".
Edit: Here is a more extensive help:
$arr = array();
$table = $dom->getElementsByTagName('table')->item(1);
foreach ($table->getElementsByTagName('tr') as $row){
$cells = $row->getElementsByTagName('td');
if ( $cells->item(0)->nodeValue == 'Active' ) {
$obj = new stdClass;
$obj->type = $cells->item(1)->nodeValue;
$obj->address = $cells->item(2)->nodeValue;
$obj->price = $cells->item(3)->nodeValue;
$obj->agent = $cells->item(4)->nodeValue;
$obj->email = $cells->item(5)->nodeValue;
$obj->phone = $cells->item(6)->nodeValue;
array_push( $arr, $obj );
}
}
print_r( $arr );

Error when parsing html using php simple_html_dom.php

I'm new to php simple_html_dom.php
I'm trying parse a small html page. But I'm getting an error.
Fatal error: Call to a member function find() on a non-object in C:\xampp\htdocs\result\do.php on line 8
My php code is here :
`
$html = new simple_html_dom();
$html->load_file('C:\xampp\htdocs\result\www.html');
$tableData = array();
$table = $html->find('table');
foreach($table->find('tr') as $row) {
$rowData = array();
foreach($row->find('td.text') as $cell) {
$rowData[] = $cell->innertext;
}
$tableData[] = $rowData;
}
echo "Result :<br/>";
foreach($tableData as $test)
echo "-".$test[0]."-".$test[1]."-".$test[2]."-".$test[3]."-".$test[4]."<br/>";
?>`
and my html page is here (www.html):
<table>
<tr>
<td width=250>Subject</td>
<td width=60 align=center>External </td>
<td width=60 align=center>Internal</td>
<td align=center width=60>Total</td>
<td align=center width=60>Result</td>
</tr>
<tr>
<td width=250><i>Analog Communication (06EC53)</i></td>
<td width=60 align=center>0</td>
<td width=60 align=center>17</td>
<td width=60 align=center>17</td>
<td width=60 align=center><b>A</b>
</td>
I want to know why I'm getting this error and how can I solve this error.
have you tried adding html and body tags around that ? I believe the library requires it.
it should be <html><body> .... </body></html>
That error means that you tried to call find() on something that was empty, probably because that something was the result of another call to find() that didn't actually find anything. The likely culprit here is $table because you never bother checking that $html->find('table') actually succeeds before going ahead and trying to use the result.

How to get data between <td> elements with Regex and Php

How can I get the "85 mph" from this html code with PHP + Regex ?
I couldn't come up with right regex
This is the code
http://pastebin.com/ffRH9K9Q
<td align="left">Los Angeles</td>
</tr>
<tr>
<td align="left">Wind Speed:</td>
<td align="left">85 mph</td>
</tr>
<tr>
<td align="left">Snow Load:</td>
<td align="left">0 psf</td>
(simplified example)
You've heard already about not using regex for the job, so I won't talk about that.
Let's try something here. Perhaps not the ideal solution, but could work for you.
<?php
$data = 'your table';
preg_match ('|<td align="left">(.*)mph</td>|Usi', $data, $result);
print_r($result); // Your result shoud be in here
You could need some trimming or taking whitespaces into account in the regex.
The first comment that links to the post about NOT PARSING HTML WITH REGEX is important. That said, try something like DOMDocument::loadHTML instead. That should get you started traversing the DOM with PHP.
To expand on DorkRawk's suggestion (in the hope of providing a relatively succinct answer that isn't overwhelming for a beginner), try this:
<?php
$yourhtml = '<td align="left">Los Angeles</td>
</tr>
<tr>
<td align="left">Wind Speed:</td>
<td align="left">85 mph</td>
</tr>
<tr>
<td align="left">Snow Load:</td>
<td align="left">0 psf</td>';
$dom = new DOMDocument();
$dom->loadHTML($yourhtml);
$xpath = new DOMXPath($dom);
$matches = $xpath->query('//td[.="Wind Speed:"]/following-sibling::td');
foreach($matches as $match) {
echo $match->nodeValue."\n\n";
}

How to parse XML/HTML server's reponse?

my first time here.
I got these lines as a response from the server and saved them in a file. They look like XML, right? My task is to read the content of those td tags and put them into other structured file(Excel). The problem is I dont know how to do that.
At the moment, I think I will strip the first and last line of the file then parse them into XML. But do you know other ways ? Thanks.
<CallbackContent><![CDATA[
<table cellspacing="0" border="0" cellpadding="0" width="100%">
<tr class="rowcolor2">
<td align="left" style="padding:5px;">22/02/2010</td>
<td align="right" style="padding:5px;">510,02</td>
</tr>
</table>
]]></CallbackContent>
Btw, I'm using PHP.
Use an XML parser such as SimpleXML. It will allow you to extract the CDATA safely.
Then if the HTML is XML-compliant (in other words, it's XHTML) you can use SimpleXML to extract data from it. For example:
$xml='<CallbackContent><![CDATA[
<table cellspacing="0" border="0" cellpadding="0" width="100%">
<tr class="rowcolor2">
<td align="left" style="padding:5px;">22/02/2010</td>
<td align="right" style="padding:5px;">510,02</td>
</tr>
</table>
]]></CallbackContent>';
$CallbackContent = simplexml_load_string($xml);
$html = (string) $CallbackContent;
// if XHTML
$table = simplexml_load_string($html);
// otherwise, use
$dom = new DOMDocument;
$dom->loadHTML($html);
$table = simplexml_import_dom($dom)->body->table;
foreach ($table->tr as $tr)
{
echo 'tr class=', $tr['class'], "\n";
foreach ($tr->td as $td)
{
echo 'td align=', $td['align'], ' - value: ', (string) $td, "\n";
}
}
You cannot read the table with an XML parser, because it is pushed out as a CDATA block, which equivocates to a string literal.
First, read the whole thing using a XML parser so that you can pull out the contents of the CDATA section. Then take that and stuff it through an HTML parser.

Categories