I have to use the HTML_Template_Sigma PEAR Module to do an assignment on PHP that basically wraps all the HTML of a website to make templates with it instead of pasting the same HTML over and over. The thing is that all the content is added using variables and at some point I have to loop through an array inside one of the string variables (which has a table inside). So I checked the documentation which is not really abundant and it does have some sort of loop implementation but oriented to publications and I don't know how to use it to solve my problem.
http://www.pixel2life.com/publish/tutorials/13/pear_module_html_template_sigma/
http://pear.php.net/manual/en/package.html.html-template-sigma.intro-syntax.php
Still what they show is not exactly the same as this.
foreach ($data as $result) {
$plantilla->setCurrentBlock('table_row');
$plantilla->setVariable(array(
'date' => $result[0],
'epicentre' => $result[1],
'region' => $result[2],
'richter' => $result[3],
'mercalli' => $result[4]
));
$plantilla->parseCurrentBlock('table_row');
}
This is my variable:
content = '
<table>
<thead>
<tr>
<th>Date</th>
<th>Epicentre</th>
<th>Region</th>
<th>Mw Richter</th>
<th>Mercalli</th>
</tr>
</thead>
<tbody>
<!-- BEGIN table_row -->
<tr>
<td>{date}</td>
<td>{epicentre}</td>
<td>{region}</td>
<td>{richter}</td>
<td>{mercalli}</td>
</tr>
<!-- END table_row -->
</tbody>
</table>';
My array contains 5 columns of data. I've tried but to no avail.
Thanks in advance!
Related
I am trying to scrape table's td tag, but first I need to check th. For example let say table structure is like below.
<tbody>
<tr>
<th>color</th>
<td>red</td>
</tr>
<tr>
<th>price</th>
<td>23.267$</td>
</tr>
<tr>
<th>brand</th>
<td>mustang</td>
</tr>
</tbody>
In this table I need to scrape mustang value. But I can't use $crawler->filter('table td')->eq(3); for that. Because position is always changing. So I need to catch the value by it's th. I mean if th's value is brand then get it's td
what is the best way to this?
Not sure it's a best solution, but I solved it with this:
$props = $node->filter("table th")->each(function($th, $i){
return $th->text();
});
$vals = $node->filter("table td")->each(function($td, $i){
return $td->text();
});
$items = [
"brand" => "",
"color" => "",
];
for ($a=0; $a < count($props); $a++) {
switch ($props[$a]) {
case 'brand':
$items["brand"] = $vals[$a];
break;
}
}
If there is another way or much better way to achieve this. Please feel free to post it here. Thank you.
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
I am fetching html from a website with file_get_contents. I have a table (with a class name) inside html, and I want to get the data inside html tags.
This is how I fetch the html data from url:
$url = 'http://example.com';
$content = file_get_contents($url);
The html looks like:
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
</body>
</table>
Is there a way to searh DOM elements in php like we do in jQuery? So that I can access the values 1, 2 (first td) and div's value inside second td.
Something like
a) search the html for table with class name space
b) inside that table, inside tbody, return each tr's 'first td's value' and 'div's value inside second td'
So I get; 1 and Mars, 2 and Earth.
Use the DOM extension, for example. Its DOMXPath class is particularly useful for such kind of tasks.
You can easily set the listed conditions with an XPath expression like this:
//table[#class="space"]//tr[count(td) = 2]/td
where
- //table[#class="space"] selects all table elements from the document having class attribute value equal to "space" string;
- //tr[count(td) = 2] selects all tr elements having exactly two td child elements;
- /td represents the td elements.
Sample implementation:
$html = <<<'HTML'
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
<tr>
<td class="earthia">3</td>
</tr>
</tbody>
</table>
HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$cells = $xpath->query('//table[#class="space"]//tr[count(td) = 2]/td');
$i = 0;
foreach ($cells as $td) {
if (++$i % 2) {
$number = $td->nodeValue;
} else {
$planet = trim($td->textContent);
printf("%d: %s\n", $number, $planet);
}
}
Output
1: Mars
2: Earth
The code above is supposed to be considered as a sample rather than an instruction for practical use, as it is not very scalable. The logic is bound to the fact that the XPath expression selects exactly two cells for each row. In practice, you may want to select the rows, iterate them, and put the extra conditions into the loop, e.g.:
$rows = $xpath->query('//table[#class="space"]//tr');
foreach ($rows as $tr) {
$cells = $xpath->query('.//td', $tr);
if ($cells->length < 2) {
continue;
}
$number = $cells[0]->nodeValue;
$planet = trim($cells[1]->textContent);
printf("%d: %s\n", $number, $planet);
}
DOMXPath::query() is called with an XPath expression relative to the current row ($tr), then checks if the returned DOMNodeList contains at least two cells. The rest of the code is trivial.
You can also use SimpleXML extension, which also supports XPath. But the extension is much less flexible as compared to the DOM extension.
For huge documents, use extensions based on SAX-based parsers such as XMLReader.
Following is the code snippet from smarty template.
Actually there is an associative array and we are showing it's values in smarty template.
One such element of an associative array is $ans.answer_text and I've to check whether there is any <img> tag present in the content(value), if the <img> tag is present I've to do some logic and if it doesn't I've to do some other logic.
But the main issue I'm facing is how to check the presence of <img> tag within the array element data?
Please help me out to resolve this issue.
Thanks in Advance.
<tr valign="top">
{foreach from=$qstn_ans.answer item=ans key=ans_no}
<td valign="top" valign="top">
{if $ans.answer_is_right==1}{assign var='correct_ans' value=$ans_no+1}{/if}
<b>{$ans_no+1}.</b>
{if $ans.answer_text!=''}{$ans.answer_text}{/if}
<br />
{if $ans.answer_file!=''}<img src="{$ans_thumb_img_path}{$ans.answer_id}_{$ans.answer_file}" />{/if}
</td>
{/foreach}
</tr>
This kind of usage defeats the purpose of using a temple engine. Ideally you should do these checks in the controller.
I would do it in the following way:
$qstn_ans = array();
// PHP Controller
foreach ($qstn_ans as $key => $value) {
$imgPath = ''; // generate the name here
$qstn_ans[$key]['hasImage'] = (file_exists($imgPath))?1:0;
}
// Template file
{if $ans.hasImage}<img src="<!-- insert image here -->" />{/if}
Have you tried using {html_image}?
http://www.smarty.net/docsv2/en/language.function.html.image.tpl
I've stripped the tag data from an url like
$url='http://abcd.com';
$d=stripslashes(file_get_contents($url));
echo strip_tags($d);
but unfortunately all the tag values are clubbed together like user14036100 9.00user23034003 11.33user32028000 14.00 where in the user1, user2, user3 attributes are stored, It is hard to analyse the attribute values as all are joined together by strip_tags().
so friends can someone help me to strip each tag and store in an array or by placing a delimiter at the end of each stripped tag data.
Thanks in advance :)
You cannot achieve this with strip_tags(), since it justs removes the tags. You wan't to replace them with e.g. a whitespace character (new line, space, ..).
You should probably do this with a regex call, which just replaces all tags.
A better way would be to parse the fetched page with DOMDocument, so that you can derive the structure directly from the HTML structure.
Example of usage of DOMDocument
You have the following example html page:
<!DOCTYPE html>
<html>
<head>
<title>This is my title</title>
</head>
<body>
<table id="someDataHere">
<tr>
<th>Country</th>
<th>Population</th>
</tr>
<tr>
<td>Germany</td>
<td>81,779,600</td>
</tr>
<tr>
<td>Belgium</td>
<td>11,007,020</td>
</tr>
<tr>
<td>Netherlands</td>
<td>16,847,007</td>
</tr>
</table>
</body>
</html>
You can use DOMDocument to fetch the entries in the table:
$url = "...";
$dom = new DOMDocument("1.0", "UTF-8");
$dom->loadHTML(file_get_contents($url));
$preparedData = array();
$table = $dom->getElementById("someDataHere");
$tableRows = $table->getElementsByTagName('tr');
foreach ($tableRows as $tableRow)
{
$columns = $tableRow->getElementsByTagName('td');
// skip the header row of the table - it has no <td>, just <th>
if (0 == $columns->length)
{
continue;
}
$preparedData[ $columns->item(0)->nodeValue ] = $columns->item(1)->nodeValue;
}
$preparedData will now hold the following data:
Array
(
[Germany] => 81,779,600
[Belgium] => 11,007,020
[Netherlands] => 16,847,007
)
Some notes
Since you are developing a crawler (spider), you are highly dependent on the HTML structure of the target webpage. You may have to adjust your crawler every time they change something in their templates.
This is just a simple example, but it should make clear, how you can now use it, to produce more advanced results.
Since DOMDocument implements the DOM methods, you have to work your way through the HTML structure with the possibilities they provide.
For very huge HTML pages DOMDocument can become quite expensive in terms of memory.
I have a table with the following structure. I cannot seem to get the data I want.
<table class="gsborder" cellspacing="0" cellpadding="2" rules="cols" border="1" id="d00">
<tr class="gridItem">
<td>Code</td><td>0adf</td>
</tr><tr class="AltItem">
<td>CompanyName</td><td>Some Company</td>
</tr><tr class="Item">
<td>Owner</td><td>Jim Jim</td>
</tr><tr class="AltItem">
<td>DivisionName</td><td> </td>
</tr><tr class="Item">
<td>AddressLine1</td><td>9314 W. SPRING ST.</td>
</tr>
</table>
This table is of course nested within another table within the page. How can I use DomDocument for example to refer to "Code" and "0adf" as a key value pair? They actually don't need to be in a key value pair but I should be able to call them each separately.
EDIT:
Using PHP Simple HTML, I was able to extract the data I needed using this:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1);
The problem with this though is that I am getting the two <td></td> tags with my data. Is there a way to only grab the raw data without the tags?
Also, is this the right way to get my data out of this table?
If you're not dead set on using DOMDocument, try using the PHP Simple HTML DOM Parser. This has the benefit of allowing you to parse HTML which is not valid XML as well as providing a nicer interface to the parsed document.
You could write something like:
$html = str_get_html(...);
foreach($html->find('tr') as $tr)
{
print 'First td: ' . $tr->find('td', 0)->plaintext;
print 'Second td: ' . $tr->find('td', 1)->plaintext;
}