Need help to pulling content from PHP Page with Dom / Regex

Need help to pulling content from PHP Page with Dom / Regex - php

So far this is my code:
<?php
$start = date("d/m/y", strtotime('today'));
$end = date("d/m/y", strtotime('tomorrow'));
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers/LineUpAdvanceSearch.aspx?text=&channel=506&genre=-1&ageRating=-1&publishYear=-1&productionCountry=-1&startDate=$start&endDate=$end&pageSize=1";
$data = file_get_contents($url, false, $context);
$re = '/LineUpId=(.+\d)/';
preg_match($re, $data, $matches);
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers//LineUpDetails.aspx?lcid=1037&luid=$matches[1]";
$data = file_get_contents($url, false, $context);
echo $data;
?>
I am trying to prepare a TV Guide for single channel and the current program,
Part of the HTML page:
<div class="GuideLineUpDetailsCenter">
<a class="LineUpbold">Name of the Show</a>
<br>
<div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
<br>
<div class="show">Information about the program</div>
<br>
<div class="LineUpbold">+14</div>
<br>
</div>
I want to pull the content and do something like this:
echo $LineUpbold;
echo $LineUpDetailsTime;
echo $show;
echo $LineUpbold;

Use a DOM parser and appropriate xpath queries instead:
<?php
$data = <<<DATA
<div class="GuideLineUpDetailsCenter">
<a class="LineUpbold">Name of the Show</a>
<br>
<div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
<br>
<div class="show">Information about the program</div>
<br>
<div class="LineUpbold">+14</div>
<br>
</div>
DATA;
# set up the dom
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
# set up the xpath
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//div[#class = 'GuideLineUpDetailsCenter']") as $container) {
$name = $xpath->query("a[#class = 'LineUpbold']/text()", $container)->item(0);
echo $name->nodeValue;
$details = $xpath->query("div[#class = 'LineUpDetailsTime']/text()", $container)->item(0);
echo $details->nodeValue;
# and so on...
}
The code loads your string, searches for divs with the class GuideLineUpDetailsCenter, loops over them and tries to find appropriate children within each div.

Related

PHP DOM how get all image links

I'm trying to download pictures from the site for exercises. But something does not work for me, I do not want to display links. Can anyone help me what am I doing wrong ??
This is my code ;)
$li = 'https://gratka.pl/nieruchomosci/mieszkanie-katowice-dabrowka-mala/ob/20357919';
$options3 = array('http' => array('method'=>"GET",
'header'=>"Accept-language: pl\r\n" .
"Cookie: foo=bar\r\n",
'user_agent' => 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'));
$context3 = stream_context_create($options3);
$text1 = file_get_contents($li, false, $context3);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($text1);
$query_string = '';
$divs = $dom->getElementsByTagName('span');
foreach ($divs as $div){
if(preg_match('/\bgallery__imageViewer\b/', $div->getAttribute('class'))) {
$links = $div->getElementsByTagName('img');
foreach($links as $link){
$foto = $link->getAttribute('src');
$query_string .= ('<center><img src="'.$foto.'"> </center><br/>');
}
}
}
print_r($query_string);
Thanks in advance to everyone for your help.

How to grab stream url with preg_match?

EDIT:
My updated PHP Code:
<?php
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Referer: http://www.ucaster.me/hembedplayer/shid05/1/1/1" .
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$url = file_get_contents('http://www.ucaster.me/hembedplayer/shid05/1/1/1', false, stream_context_create($opts));
preg_match("/playlist\.m3u8\?id=([^&=>]+).*?enableVideo\("([^"]*)/s", $url, $m);
$video_id = $m[1];
$video_pk = $m[2];
echo $video_pk;
?>
What I have to do is pull the stream pk inside the source
Page Source (same as before):
<script type="text/javascript">
var videoPlayer = document.createElement("video");
videoPlayer.setAttribute("id", "videoplayer");
videoPlayer.setAttribute("width", "580");
videoPlayer.setAttribute("height", "450");
videoPlayer.setAttribute("poster", "/static/images/logo.png");
videoPlayer.setAttribute("autoplay", true);
videoPlayer.setAttribute("controls", "");
var em = document.createElement("em");
em.innerHTML = "Sorry, your browser doesn't support HTML5 video.";
videoPlayer.appendChild(em);
document.getElementById("player_div").appendChild(videoPlayer);
function setupVideo() {
if (Hls.isSupported()) {
var video = document.getElementById('videoplayer');
var player = new Hls();
player.attachMedia(video);
player.on(Hls.Events.MEDIA_ATTACHED, function () {
var hlsUrl = "http://" + ea + ":8088/live/shid04/playlist.m3u8?id=95328&pk=";
hlsUrl = hlsUrl + enableVideo("5be02e45f5917b29199f8e5326499a6f8c6c7c9df86920b38c09bee46b050289");
player.loadSource(hlsUrl);
player.on(Hls.Events.MANIFEST_PARSED, function (event, data) {
video.play();
});
});
}else {
em.innerHTML = "Sorry, your browser doesn't support HTML5 video.";
}
}
</script>
I am trying to display in echo the pk like this:
5be02e45f5917b29199f8e5326499a6f8c6c7c9df86920b38c09bee46b050289

See https://regex101.com/r/aIxtsI/1
preg_match("/playlist\.m3u8\?id=([^&=>]+).*?enableVideo\("([^"]*)/s", $input, $m);
$video_id = $m[1];
$video_pk = $m[2];

member function error in data extraction

I am trying get data from a external source but i can't get the data and i am facing this error.
Notice: Trying to get property of non-object in E:\xampp\htdocs\test\merit-list.php on line 38
Fatal error: Call to a member function find() on null in E:\xampp\htdocs\test\merit-list.php on line 39
Here is my code
<?php
require('resources/inc/simple_html_dom.php');
$linksrc = 'http://58.65.172.36/Portal/WebSiteUpdates/Achievements.aspx';
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => $linksrc,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 3000,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
));
$file = curl_exec($curl);
$error = curl_error($curl);
curl_close($curl);
$dom = new simple_html_dom();
$dom->load($file);
$doctorDivs = $dom->find("table#Farooq", 0)->children();
$doctors = array();
foreach($doctorDivs as $div){
$doctor = array();
// line 38
$image = $doctor["image"] = $div->find('img', 0)->src;
$details = $div->find('tr', 0)->find("td");
$name = $doctor["name"] = trim($details[1]->plaintext);
$spec = $doctor["desc"] = trim($details[2]->plaintext);
$doctors[] = $doctor;
echo $image;
echo $name;
echo $spec;
}
?>

The problem is that the first row of the table doesn't have an <img>, because it's the row of headings, so $div->find("img", 0) returns null. You get the first error when you try to access null->src.
The second error is because $div is the <tr> element. $div->find("tr") searches the children of $div, it doesn't include $div itself, so it always returns null. Also, this code won't work in the heading row, either, because it contains <th> rather than <tr> elements.
You could just skip over the heading row by putting:
array_shift($doctorDivs);
before the foreach loop. This will remove the first element of the array.
And change $details to :
$details = $div->find("td");

PHP parsing stops

I'm trying to parse pages. I've read that it needs to set a header to avoid a 500 server error So I did.
But what happens is after 5 or so pages, the parsing stops. No error it just stops.
The code:
$url = 'http://www.someurlhere.com';
$options = array('http' => array('header' => "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4"));
$context = stream_context_create($options);
$html = file_get_html($url, false, $context);
edit
foreach($html->find('table.votes tr.even,tr.odd') as $tr) {
if ($tr->find('td', 3) == '<td>absent</td>') {
$absent = $absent + 1;
}
$possible = $possible + 1;
}
echo 'absent=> ' . $absent . ' out of => ' . $possible . '<br>';

Using Xpath with php to parse html from a website

Currently I'm trying to use xpath to parse an html page from a website.
I need to get a result in the format:
Time of the program : Program name
For example:
1.00PM : Ye Hai Mohabbatein
I am using the following code (as shown here) to obtain it but it is not working.
<?php
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTMLFile("www.starplus.in/schedule.aspx");
$xpath = new DomXPath($dom);
$nodes = $xpath->query("//table");
foreach ($nodes as $i => $node) {
echo "hy";
echo "Node($i): ", $node->nodeValue, "\n";
}
?>
I will be thankful if anybody help me out in this issue.

Basically, just target the table div/table which has that name of the show and the timeslot.
Rough example:
// it seems it doesn't work when there is no user agent
$ch = curl_init('http://www.starplus.in/schedule.aspx');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($page);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$shows = array();
$tables = $xpath->query("//div[#class='sech_div_bg']/table"); // target that table
foreach ($tables as $table) {
$time_slot = $xpath->query('./tr[1]/td/span', $table)->item(0)->nodeValue;
$show_name = $xpath->query('./tr[3]/td/span', $table)->item(0)->nodeValue;
$shows[] = array('time_slot' => $time_slot, 'show_name' => $show_name);
echo "$time_slot - $show_name <br/>";
}
// echo '<pre>';
// print_r($shows);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Need help to pulling content from PHP Page with Dom / Regex - php

Related

PHP DOM how get all image links

How to grab stream url with preg_match?

member function error in data extraction

PHP parsing stops

Using Xpath with php to parse html from a website

Categories

Resources