How do I find this div ? (PHP Simple HTML DOM Parser) - php

This is my code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$title=$html->find('div#ires', 0)->innertext;
echo $title;
?>
It outputs all result of the Google Search Page under the Search "BA236".
The problem is I dont need all of them and the Information I need is inside a div that has no id or class or anything else.
The div I need is inside the first
<div class="g">
on the Page, so maybe I should try something like this:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$title=$html->find('div[class=g], 0')->innertext;
echo $title;
?>
But the Problem of that is, if I load the page it shows me nothing except this:
Notice: Trying to get property of non-object in
C:\xampp\htdocs...\simpletest2.php on line 4
So how can i get the div i´m searching for and what am I doing wrong ?
Edit:
Solution:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$e = $html->find("div[class=g]");
echo $e[0]->innertext;
?>
Or:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$title=$html->find('div[class=g]')[0]->innertext;
echo $title;
?>

I made a change to your code where I am searching for the class:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$e = $html->find("div[class=g]");
echo $e[0]->innertext;
?>
result:
British Airways Flight 236
Scheduled departs in 13 hours 13 mins
Departure DME 5:40 AM —
Moscow Dec 15
Arrival LHR 6:55 AM Terminal 5
London Dec 15
Scheduled departs in 1 day 13 hours
Departure DME 5:40 AM —
Moscow Dec 16
Arrival LHR 6:55 AM Terminal 5
London Dec 16
I looked for the div elements with class g then I printed the count of the first element '0'
$e = $html-> find ("div [class = g]");
echo $e [0]->innertext;
Your code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.google.com/search?q=BA236',false);
$title=$html->find('div[class=g]')[0]->innertext;
echo $title;
?>
not ('div[class=g], 0')
but ('div[class=g]')[0]

there is no need for simple_html_dom here, it's easy to do with the builtins DOMDocument and DOMXPath.
<?php
$html = file_get_contents('http://www.google.com/search?q=BA236');
echo (new DOMXPath ( (#DOMDocument::loadHTML ( $html )) ))->query ( '//div[#class="g"]' )->item ( 0 )->textContent;
in my opinion, DOMDocument + DOMXPath makes simple_html_dom.php rather pointless. the former 2 can do pretty much everything simple_html_dom can do, and are built-in native php functions, which is likely to be maintained as long as PHP itself is maintained, and the latter is a 3rd party project which seems nearly dead by the looks of it (last commit was in 2014, there was only 1 commit in all of 2014, and 2 commits in all of 2013 )

Related

Need some assistence with Regex (PHP)

I'd like to parse txt files to HTML using preg_replace to add formatting.
The format of the file is like this :
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
This should be treated as a group and parsed into a table, like :
<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>
For now, I'm using two separate preg_replacements, one for the first line (date) and a second one for the following ones, which can be just one or up to 100 or so. But, this file can contain other text as well, which needs to be ignored (as for the replacement), but if this line has more or less the same format (7 digits and some text) it gets formatted as well :
$file = preg_replace('~^\s*((\[.*\]){0,2}\d{1,2}:\d{2}:\d{2}(\[/.*\]){0,2})\s(\d{2}-\d{2}-\d{2}(\[/.*\]){0,2})\s+(?:\d{2}/\d{3}\s+|)(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\s+(.+)$~m', '<table class="file"><tr class="entry"><td class="time">$1 $4</td><td class="day">$6</td><td class="message">$7</td></tr>', $file);
$file = preg_replace('~^\s*(.{0,11}?)\s*((\[.+?\])?\d{7}(\[/.+?\])?)\s+(.+?)$~m', '<tr class="id"><td class="optional">$1</td><td class="id">$2</td><td class="message">$5</td></tr>', $file);
How to improve this? Like, if I have this content :
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better
So, I'd like to catch and preg_replace only the first block and the last one, starting with time/date and some following lines, starting with a 7-digit ID.
So far, thanks for reading ;)
I think this accomplishes what you are trying to do.
There was one line that were unclear to me why it should be ignored:
1234570 This line should be ignored
This line meets the 7 digits and some text requirement.
The regex I came up with was:
/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m
Here is a regex101 demo: https://regex101.com/r/qB0gH6/1
and in PHP usage:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);
Output:
<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>
Liverpool - WBA 2-2
<td>1234570</td><td></td><td>This line should be ignored</td>
<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>
Okay, per your update it is a bit more complicated but I think this does it:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.+?)\n((\d{7})\h+(.+?)(\n|$))+/',
function ($matches) {
$lines = explode("\n", $matches[0]);
$theoutput = '<table><tr>';
foreach($lines as $line) {
if(preg_match('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.*)/', $line, $output)) {
//it is the first date string line;
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
} else {
if(preg_match('/(\d{7})\h*(.*)/', $line, $output)) {
$theoutput .= '</tr><tr>';
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
}
}
}
$theoutput .= '</tr></table>';
return $theoutput;
}, $string);
Output:
<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2
1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>

Array size in PHP shown as one despite of having more elements

Following is my code which I am using for testing purposes so far. I have to use it further in my project.
It access a certain web service and retrieves data as xml. XML is brought to $arr array.
Address of $url and 3rd parameter in client->call() are not mentioned on purpose.
$url = "";
$client= new nusoap_client($url);
$param = array("status"=>"p2");
$arr = $client->call('getAllVisitByStatus',$param,'');
echo $arr."\n";
$size = sizeof($arr);
echo $size;
for($num=0; $num<986; ++$num)
{
echo $arr[$num], "\n";
if($arr[$num] == '>')
{
echo "<br/> ";
}
}
If I save the data returned by client->call() into an array and print it with a loop then it prints the XML like this
<?xml version = "1.0" encoding = "UTF - 8" standalone = "yes" ?>
<lists>
<visitW>
<followup> 2015 - 01 - 30 00:50:00.0 </followup>
<person_id> 12 </person_id>
<remarks> nothing </remarks>
<treatment> doing </treatment>
<visit_date> 2015 - 01 - 04 - 00 - 24 - </visit_date>
<visit_id> 4 </visit_id>
<visit_type> Hesschart</visit_type>
</visitW>
</lists>
However, if take $arr as a string, it prints this:
2015-01-30 00:50:00.0 12 nothing doing 2015-01-04-00-24- 4 Hesschart
So, in a string it prints without tags and like this.
The problem is that when the size of array is printed, it prints 1. However, the array contains the whole XML brought as a result of service call.
When I use a loop of exact number of elements i.e. 986, then it prints the whole XML as it is.
The question is that why does it show 1 as the size of the array? Also, can this array containing XML be put in DOM Parser?

Get date from earthtool - PHP & XML parsing

I found this web service which provides the date time of a timezone. http://www.earthtools.org/timezone-1.1/24.0167/89.8667
I want to call it & get the values like isotime with php.
So I tried
$contents = simplexml_load_file("http://www.earthtools.org/timezone-1.1/24.0167/89.8667");
$xml = new DOMDocument();
$xml->loadXML( $contents );
AND also with
file_get_contents
With file_get_contents it gets only a string of numbers not the XML format. Something like this
1.0 24.0167 89.8667 6 F 20 Feb 2014 13:50:12 2014-02-20 13:50:12 +0600 2014-02-20 07:50:12 Unknown
Nothing worked. Can anyone please help me that how can I get the isotime or other values from that link using PHP?
Everything works):
$url = 'http://www.earthtools.org/timezone-1.1/24.0167/89.8667';
$nodes = array('localtime', 'isotime', 'utctime');
$cont = file_get_contents($url);
$node_values = array();
if ($cont && ($xml = simplexml_load_string($cont))) {
foreach ($nodes as $node) {
if ($xml->$node) $node_values[$node] = (string)$xml->$node;
}
}
print_r($node_values);

Trouble using PHP to parse an XML document with identical tags

Here is a snippet from yahoo Weather showing the identical tags
<yweather:forecast day="Mon" date="16 Jan 2012" low="-1" high="6" text="Clear" code="31"/>
<yweather:forecast day="Tue" date="17 Jan 2012" low="3" high="7" text="Mostly Sunny" code="34"/>
To access the day in the first tag I use the following function:
function get_forecast_day(SimpleXMLElement $xml) {
// Pull forecast day
$forecast['day'] = $xml->channel->item->children('yweather', TRUE)->forecast->attributes()->day;
echo $forecast['day'] . ", ";
return $day;
}
Any ideas how I can access the day in the second tag. Obviously searching for the value "Tue" is no good as these values will change daily.
Thanks in advance.
->forecast can be used as array, so go for second element with index 1:
$xml->channel->item->children('yweather', TRUE)->forecast[1]->attributes()->day

PHP Integers with leading zeros

Particularly, 08 and 09 have caused me some major trouble. Is this a PHP bug?
Explanation:
I have a calendar 'widget' on a couple of our client's sites, where we have a HTML hard-coded calendar (I know a PHP function can generate n number of months, but the boss man said 'no').
Within each day, there is a PHP function to check for events on that day, passing the current day of the month like so:
<td valign="top">01<?php printShowLink(01, $events) ?></td>
$events is an array of all events on that month, and the function checks if an event is on that day:
function printShowLink($dayOfMonth, $eventsArray) {
$show = array();
$printedEvent = array();
$daysWithEvents = array();
foreach($eventsArray as $event) {
if($dayOfMonth == $event['day'] && !in_array($event['id'], $printedEvent)){
if(in_array($event['day'], $daysWithEvents)) {
echo '<hr class="calendarLine" />';
} else {
echo '<br />';
}
$daysWithEvents[] = $event['day']; // string parsed from timestamp
if($event['linked'] != 1) {
echo '<div class="center cal_event '.$event['class'].'" id="center"><span title="'.$event['title'].'" style="color:#666666;">'.$event['shorttitle'].'</span></div>';
$printedEvent[] = $event['id'];
} else {
echo '<div class="center cal_event '.$event['class'].'" id="center">'.$event['shorttitle'].'</div>';
$printedEvent[] = $event['id'];
}
}
}
}
On the 8th and 9th, no events will show up. Passing a string of the day instead of a zero-padded integer causes the same problem.
The solution is as what is should have been in the first place, a non-padded integer. However, my question is, have you seen this odd behavior with 08 and/or 09?
I googled this and couldn't find anything out there.
Quote it. 0123 without quotes is octal in PHP. It's in the docs
$ php -r 'echo 01234, "\n01234\n";'
668
01234
$
So you should change your code to
<td valign="top">01<?php printShowLink('01', $events) ?></td>
It's been a while since I've had to wade through so much PHP been doing mostly Javascript for 3 years. But 08 and 09 being a problem makes me think: they could be getting treated as octal (base 8), and the digits 8 and 9 do not exist in octal.

Categories