Web scraping with PHP and HTML DOM Parser - php

I'm trying to scrape the site inside the code but I would it in table format.
$url='http://www.arbworld.net/en/moneyway';
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->recover=true;
$dom->strictErrorChecking=false;
$dom->loadHTMLFile( $url );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$col=$xp->query('//table[#class="grid"]/tr[#class="belowHeader"]/td');
if( $col->length > 0 ){
foreach( $col as $node )echo $node->textContent;
}
Now the output is this:
Romanian Liga I22.Dec 18:00:00 FCSBUniversitat2.063.33.999.9 %€ 2070.1
%€ 00 %€ 0€ 207 22.Dec 18:00:00 Italian Serie A22.Dec 11:30:00
AtalantaAC Milan1.8844.499.7 %€ 21 5580.1 %€ 170.2 %€ 46€ 21 622
22.Dec 11:30:00 English League 221.Dec 15:0
0:00

You should retrieve the rows instead of the columns (without the /td at the end), then simply put everything into an HTML table, with one <tr> for each row:
<?php
// your current code
$xp = new DOMXPath($dom);
$rows = $xp->query('//table[#class="grid"]/tr[#class="belowHeader"]');
?>
<table>
<tbody>
<?php foreach ($rows as $row): ?>
<tr>
<?php foreach ($row->childNodes as $col): ?>
<?php if ($col->getAttribute('style') !== 'display:none'): ?>
<?php foreach ($col->childNodes as $colPart): ?>
<?php if ($colText = trim($colPart->textContent)): ?>
<td><?= $colText ?></td>
<?php elseif ($colPart instanceof DOMElement && $colPart->tagName === 'a'): ?>
<?php
$href = $colPart->getAttribute('href');
if (strpos($href, 'javascript') !== 0):
?>
<td><?= $colPart->getAttribute('href') ?></td>
<?php endif ?>
<?php endif ?>
<?php endforeach ?>
<?php endif ?>
<?php endforeach ?>
</tr>
<?php endforeach ?>
</tbody>
</table>

Related

PHP with DOM Xpath - Remove childNode and arrange string

I have this html structure:
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>
Now, I have this XPath code:
$doc = new DOMDocument();
#$doc->loadHtmlFile('http://www.whatever.com');
$doc->preserveWhiteSpace = false;
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div[2]/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
$result = $node->nodeValue."\n";
}
This gets me 'soft 130 cm / 15cm' as a result.
But I want to know how to get only '15', so I need:
1. To know how to get rid of the childNode->nodeValue
2. Once I have '130 cm / 15cm', to know how to get only '15' as the nodeValue of a variable in PHP.
Can you guys help?
Thanks in advance
Text within a tag is also a node (a child), more particularly a DOMText.
By looking at the children of that div, you can find the DOMText and get its nodeValue. An example below:
$doc = new DOMDocument();
$doc->loadHTML("<html><body><p>bah</p>Test</body></html>");
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText)
echo $child->nodeValue."\n"; // should output "Test".
}
}
Your second point can easily be done with regular expressions:
$string = "130 cm / 15cm";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', $string, $matches);
echo $matches[1];
Full Solution:
<?php
$strhtml = '
<html>
<body>
<section>
<div>
<div>
<section>
<div>
<table>
<tbody>
<tr></tr>
<tr>
<td></td>
<td></td>
<td>
<i></i>
<div class="first-div class-one">
<div class="second-div"> soft </div>
130 cm / 15cm
</div>
</td>
</tr>
<tr></tr>
</tbody>
</table>
</div>
</section>
</div>
</div>
</section>
</body>
</html>';
$doc = new DOMDocument();
#$doc->loadHTML($strhtml);
echo $doc->saveHTML();
$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div/div/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
if ($node->childNodes)
foreach ($node->childNodes as $child) {
if($child instanceof DOMText && trim($child->nodeValue) != "")
{
echo 'Raw: '.trim($child->nodeValue)."\n";
$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', trim($child->nodeValue), $matches);
echo 'Value: '.$matches[1]."\n";
}
}
}

PHP Show 2nd foreach in td next to the first loop's values

I have a SQLite- file with values.
I get those values and display them in a table of 2 columns:
<?php
$db = new PDO("sqlite:$dbPath");
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING);
$stmt = $db->prepare('SELECT * FROM Object;');
$stmt->execute();
$res = $stmt->fetchAll(PDO::FETCH_ASSOC);
?>
<table id="table1">
<caption><em>caption</em></caption>
<?php foreach($res as $objekt):
$imageFile = basename($objekt['image']); ?>
<tr>
<td><?php if(isset($objekt['image'])): ?>
<figure class="objectPicture center">
<img src="img/bmo/250/<?php echo $imageFile;?>" alt="<?php echo $objekt['title'];?>">
<?php else: ?>no picture.<?php endif ?><br>
<figcaption><?php echo $objekt['title']; ?>
</figcaption>
</figure>
</td>
<td><?php echo $objekt['text']; ?><br>
<?php echo $objekt['owner']; ?></td>
</tr>
<?php endforeach; ?>
</table>
What I'd like to do is to show the next foreach-loop (values of row2 in the SQLite- file) in another 2 columns next to the first result. And the 3rd loop under the first one and so on.
How do I do that? (total newbie to PHP and SQL here, in case you didn't notice)
The results now:
What I want:
3 Changes to make
Add $idx to foreach loop
<?php foreach($res as $idx => $objekt):
Wrap if around <tr>
<?php if($idx % 2 == 0) { ?>
<tr>
<?php } ?>
Wrap if around </tr>
<?php if($idx % 2 == 1) { ?>
</tr>
<?php } ?>
The above code doesn't consider the case when row number is odd

php foreach. select specific value from array and use it as parameter for display loop

I want to be able to display titles of the books of the author who is currently logged in. I'm using PHP session
<? foreach ($books as $book ): ?>
<? foreach ($book as $selbook => $author): ?>
<option value="$selbook>"$author['author'] == $_SESSION["sess_username"] ? ' selected="selected"' : ''?>> $author?></option>
<li class="active"><a href=""><span class="pull-right"><input id="button" type="submit" name="submitr" value="Edit"></span><i class="icon-fire$
<? echo htmlspecialchars($book['Title'], ENT_QUOTES, 'UTF-8'); ?> <strong> - </strong><em>
<? echo htmlspecialchars($book['author'], ENT_QUOTES, 'UTF-8');?></em></a></li>
<? endforeach; ?>
<? endforeach; ?>
You start 2nd foreach with this:
<?php foreach ($book as $selbook => $author) { ?>
And end with this:
<?php endforeach; ?>
It is not correct. Use this:
<?php foreach ($book as $selbook => $author): ?>
// some code
<?php endforeach; ?>
Also, why u use <?php? Short tag is not enabled? Use <? it is much faster to write and code is more readable. Also, when you want to echo some variable, using php tag use this:
<?=$variable;?>
Much faster and more readable too.
Update
Try this:
<? foreach ($books as $book ): ?>
<? foreach ($book as $selbook => $author): ?>
<option value="<?=$selbook;?>" <? if($author['author'] == $_SESSION["sess_username"]): ?>selected="selected"<? endif; ?> ><?=$author;?></option>
<li class="active"><a href=""><span class="pull-right"><input id="button" type="submit" name="submitr" value="Edit"></span><i class="icon-fire$
<? echo htmlspecialchars($book['Title'], ENT_QUOTES, 'UTF-8'); ?> <strong> - </strong><em>
<? echo htmlspecialchars($book['author'], ENT_QUOTES, 'UTF-8');?></em></a></li>
<? endforeach; ?>
<? endforeach; ?>

Looping through database and returning relevant data

In my controller I have this Code to loop through database and return the data
$faultgroup = $this->booking_model->Get_Fault_Group_Display($grouptype);
$data['Get_Fault_Group_Display'] = $faultgroup; $getresults = array();
$data['get_fault_group_data'] = array();
foreach ($faultgroup as $key ) {
$show = $key->Showgroup;
$getresults = $this->booking_model->get_fault_group_data($grouptype,$show);
$data['get_fault_group_data'] = $getresults ;
}
In my View i have this Code to loop through each record with the specific grouptype and display record (to_do_item) from database that match that grouptype
<?php if ( ! is_null($Get_Fault_Group_Display)): ?>
<?php if (count($Get_Fault_Group_Display)): ?>
<?php foreach ($Get_Fault_Group_Display as $result): ?>
<?php echo $result->Showgroup; ?>
<?php foreach ($get_fault_group_data as $key) :?>
<?php echo $key->to_do_item; ?>
<?php endforeach ?>
<?php endforeach ?>
<?php else: ?>
<?php endif ?>
My problem is only the last row is shown on all the grouptypes because the loop keeps overiding $data['get_fault_group_data'] with the new $getresults
Shouldn't you use the $data['get_fault_group_data'] as an array?
Controler:
$data['get_fault_group_data'][$key] = $getresults ;
View:
<?php if ( ! is_null($Get_Fault_Group_Display)): ?>
<?php if (count($Get_Fault_Group_Display)): ?>
<?php foreach ($Get_Fault_Group_Display as $i => $result): ?>
<?php echo $result->Showgroup; ?>
<?php foreach ($get_fault_group_data[$i] as $key) :?>
<?php echo $key->to_do_item; ?>
<?php endforeach ?>
<?php endforeach ?>
<?php else: ?>
<?php endif ?>

endforeach in loops?

I use brackets when using foreach loops. What is endforeach for?
It's mainly so you can make start and end statements clearer when creating HTML in loops:
<table>
<? while ($record = mysql_fetch_assoc($rs)): ?>
<? if (!$record['deleted']): ?>
<tr>
<? foreach ($display_fields as $field): ?>
<td><?= $record[$field] ?></td>
<? endforeach; ?>
<td>
<select name="action" onChange="submit">
<? foreach ($actions as $action): ?>
<option value="<?= $action ?>"><?= $action ?>
<? endforeach; ?>
</td>
</tr>
<? else: ?>
<tr><td colspan="<?= array_count($display_fields) ?>"><i>record <?= $record['id'] ?> has been deleted</i></td></tr>
<? endif; ?>
<? endwhile; ?>
</table>
versus
<table>
<? while ($record = mysql_fetch_assoc($rs)) { ?>
<? if (!$record['deleted']) { ?>
<tr>
<? foreach ($display_fields as $field) { ?>
<td><?= $record[$field] ?></td>
<? } ?>
<td>
<select name="action" onChange="submit">
<? foreach ($actions as $action) { ?>
<option value="<?= $action ?>"><?= action ?>
<? } ?>
</td>
</tr>
<? } else { ?>
<tr><td colspan="<?= array_count($display_fields) ?>"><i>record <?= $record['id'] ?> has been deleted</i></td></tr>
<? } ?>
<? } ?>
</table>
Hopefully my example is sufficient to demonstrate that once you have several layers of nested loops, and the indenting is thrown off by all the PHP open/close tags and the contained HTML (and maybe you have to indent the HTML a certain way to get your page the way you want), the alternate syntax (endforeach) form can make things easier for your brain to parse. With the normal style, the closing } can be left on their own and make it hard to tell what they're actually closing.
It's the end statement for the alternative syntax:
foreach ($foo as $bar) :
...
endforeach;
Useful to make code more readable if you're breaking out of PHP:
<?php foreach ($foo as $bar) : ?>
<div ...>
...
</div>
<?php endforeach; ?>
as an alternative syntax you can write foreach loops like so
foreach($arr as $item):
//do stuff
endforeach;
This type of syntax is typically used when php is being used as a templating language as such
<?php foreach($arr as $item):?>
<!--do stuff -->
<?php endforeach; ?>
It's just a different syntax. Instead of
foreach ($a as $v) {
# ...
}
You could write this:
foreach ($a as $v):
# ...
endforeach;
They will function exactly the same; it's just a matter of style. (Personally I have never seen anyone use the second form.)
How about this?
<ul>
<?php while ($items = array_pop($lists)) { ?>
<ul>
<?php foreach ($items as $item) { ?>
<li><?= $item ?></li>
<?php
}//foreach
}//while ?>
We can still use the more widely-used braces and, at the same time, increase readability.
Using foreach: ... endforeach; does not only make things readable, it also makes least load for memory as introduced in PHP docs
So for big apps, receiving many users this would be the best solution
How about that?
<?php
while($items = array_pop($lists)){
echo "<ul>";
foreach($items as $item){
echo "<li>$item</li>";
}
echo "</ul>";
}
?>

Categories