I'm trying to parse pages. I've read that it needs to set a header to avoid a 500 server error So I did.
But what happens is after 5 or so pages, the parsing stops. No error it just stops.
The code:
$url = 'http://www.someurlhere.com';
$options = array('http' => array('header' => "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4"));
$context = stream_context_create($options);
$html = file_get_html($url, false, $context);
edit
foreach($html->find('table.votes tr.even,tr.odd') as $tr) {
if ($tr->find('td', 3) == '<td>absent</td>') {
$absent = $absent + 1;
}
$possible = $possible + 1;
}
echo 'absent=> ' . $absent . ' out of => ' . $possible . '<br>';
Related
I need little help..
We have a test code which use fsockopen. Code:
<?php
//require_once "../common.php";
$url = "https://xxxxx/xxxx/fsockopen_called_file.php";
$close = true;
echo "<pre>";
error_log("\n\n");
trigger_error("1. fsockopen url meghivasa: ".$url);
$result = call_url($url, $close);
trigger_error("2. eredmény: ".var_export($result, true)."\n\n");
function call_url($url, $close = TRUE) {
$user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36";
$parts = parse_url($url);
if (#$parts["scheme"] === "https" && (!#$parts["port"] || $parts["port"] == 80)) {
$parts["port"] = 443;
$parts["scheme"] = "ssl";
}
$out = "GET ".$parts["path"]." HTTP/1.1\r\n";
$out.= "Host: ".$parts["host"]."\r\n";
$out.= "User-Agent: ".$user_agent."\r\n";
$out.= "Content-Length: 0\r\n";
if ($close) { $out.= "Connection: Close\r\n"; }
$out.= "\r\n";
$fsock_url = ($parts["scheme"] !== "http"? ($parts["scheme"]."://") : "").$parts["host"];
$fp = fsockopen($fsock_url, isset($parts["port"])? $parts["port"] : 80, $errno, $errstr, 30);
fwrite($fp, $out);
$result = "";
//Ha a kapcsolatot lezárjuk, akkor nem várjuk meg a választ.
if (!$close) {
while (!feof($fp)) {
$result .= fgets($fp, 128);
}
}
fclose($fp);
if (is_bool($fp) && !$fp && !$errno) {
//Az fsockopen false értékkel tért vissza, és nincs az errno változóban hibakód.
$errno = 1;
$errstr = "Nem lehetett a szerverhez kapcsolódni: ".$url." => ".$fsock_url;
}
return ["errno" => $errno, "errstr" => $errstr, "result" => $result];
}
My problem is that the file(fsockopen_called_file.php) that the code calls does not appear in the access log.
accesslog:
x.x.x.x - [19/Jun/2022:20:20:32 +0200] "GET /xxx/fsockopen.php HTTP/1.1" 200 4890 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"
Thats it. fsockopen_called_file.php file doesnt apper. It's like doesnt work.
Server info:
Ubuntu 20.04
Apache 2.4.54-1+ubuntu20.04.1+deb.sury.org+1
Php7.4 1:7.4.30-1+ubuntu20.04.1+deb.sury.org+1
I tried this options:
-firewall off
-I build same server same options and its worked
Does anyone have any ideas that can help me?
Thanks,
Balee
I'm trying to download pictures from the site for exercises. But something does not work for me, I do not want to display links. Can anyone help me what am I doing wrong ??
This is my code ;)
$li = 'https://gratka.pl/nieruchomosci/mieszkanie-katowice-dabrowka-mala/ob/20357919';
$options3 = array('http' => array('method'=>"GET",
'header'=>"Accept-language: pl\r\n" .
"Cookie: foo=bar\r\n",
'user_agent' => 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'));
$context3 = stream_context_create($options3);
$text1 = file_get_contents($li, false, $context3);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($text1);
$query_string = '';
$divs = $dom->getElementsByTagName('span');
foreach ($divs as $div){
if(preg_match('/\bgallery__imageViewer\b/', $div->getAttribute('class'))) {
$links = $div->getElementsByTagName('img');
foreach($links as $link){
$foto = $link->getAttribute('src');
$query_string .= ('<center><img src="'.$foto.'"> </center><br/>');
}
}
}
print_r($query_string);
Thanks in advance to everyone for your help.
the problem i cannot post test data is because the ids will be taken once someone tries it out
it is basically a signup. It needs a secret ID plus unused email. Then there is a second step, if both requirements were met in step 1, one can enter full_name and password. The second step is what fails for me in curl, but works using postman, and I have no clue why. Here is the postman export:
https://gist.github.com/Jossnaz/31983240d57038ccb10afa88dd0765ae
here is the naive code I run, I use this curl wrapper:
https://github.com/php-curl-class/php-curl-class
note: I started with the minimal things I had in postman, but the second call wouldn't work, so I added more things I tried out and left them so you can see what I tried out without high expectation that it would actually make it work
try outs are these:
$c->get("https://fond.co/public#/id_signup/themotcard.com");
setting cookies
setting user agent, referer and origin headers
require(__DIR__ . '/vendor/autoload.php');
$c = new Curl();
$c->get("https://fond.co/public#/id_signup/themotcard.com");
$c->setHeader('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36');
$cookies = $c->getResponseCookies();
$c = new Curl();
$c->setCookies($cookies);
$c->setHeader('Content-Type', 'application/json');
$c->setHeader('Authorization', 'letmein');
$c->setHeader('X-Anyperk-Client-Id', 'AnyPerk-PublicApp/1/2');
$c->setHeader('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36');
$c->post('https://fond.co/api/public/company_staff_signup/signup',
[
"company_domain_name" => "themotcard.com",
"staff_identifier" => "35962466970",
"email" => "tesasdfaftasdxasdfaasdsasdfxabc22321#testabc.com",
]);
if ($c->error) {
echo 'Error: ' . $c->errorCode . ': ' . $c->errorMessage . "\n <br/>" . var_dump($c->response);
} else {
echo 'Response:' . "\n";
var_dump($c->response);
}
sleep(4)
echo "<br/>";
echo "<br/>";
echo "<br/>";
echo "<br/>";
$c = new Curl();
$c->setCookies($cookies);
$c->setHeader('Content-Type', 'application/json');
$c->setHeader('Authorization', 'letmein');
$c->setHeader('X-Anyperk-Client-Id', 'AnyPerk-PublicApp/1/2');
$c->setHeader('Host', 'fond.co');
$c->setHeader('Origin', 'https://fond.co');
$c->setHeader('Referer', 'https://fond.co/public');
$c->setHeader('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36');
$c->post('https://fond.co/api/public/company_staff_invitations/staff_details',
[
"full_name" => "lukas curl",
"password" => "test1x2x3x45"
]);
if ($c->error) {
echo 'Error: ' . $c->errorCode . ': ' . $c->errorMessage . "\n";
} else {
echo 'Response:' . "\n";
var_dump($c->response);
}
var_dump($c->response);
die('bb');
this is the response I get:
//step1, the logo indicates a successful submission and creation
Response: object(stdClass)#2592 (1) { ["logo_url"]=> string(104) "https://d17yvb56124x4l.cloudfront.net/company_informations/customized_logos/8709/original.png?1512057076" }
//step2
Error: 401: HTTP/1.1 401 Unauthorized object(stdClass)#2599 (1) { ["error"]=> string(12) "Unauthorized" } bb
So the issue were cookies
this app uses cookies for a session id. php curl nor guzzle or any http client I tried, stored the cookies for the following request. Thus resulting in 401 unauthorized.
I ended up using guzzle btw, I was just trying different http clients and finally ended using guzzle, and that is where I figured out a way how to set the cookies for the second call.
this is my final code (the sleep is set to 3 seconds, probably one can remove the sleep entirely or set it to a lot less. I just found the fix so I will post with 3s sleep)
require(__DIR__ . '/vendor/autoload.php');
$client = new GuzzleHttp\Client();
$res = $client->post('https://fond.co/api/public/company_staff_signup/signup', [
GuzzleHttp\RequestOptions::JSON =>
[
"company_domain_name" => "themotcard.com",
"staff_identifier" => "37952992345",
"email" => "tessyyyz3321#testabc.com",
]
,
GuzzleHttp\RequestOptions::HEADERS => [ 'x-anyperk-client-id' => 'AnyPerk-PublicApp/1/2',
'authorization' => 'letmein',
'content-type' => 'application/json']
]);
$cookieJar = new SessionCookieJar('motr_guzzle_sessioncookies', true);
sleep(3);
$request = new Request('POST', 'https://fond.co/api/public/company_staff_invitations/staff_details',
['x-anyperk-client-id' => 'AnyPerk-PublicApp/1/2',
'authorization' => 'letmein',
'content-type' => 'application/json'],
\GuzzleHttp\json_encode([
"full_name" => "lukas guzzle",
"password" => "test1x2x3x45"
])
);
$cookieJar->extractCookies($request, $res);
$request = $cookieJar->withCookieHeader($request);
$res = $client->send($request);
echo '<pre>';
var_dump($res);
echo '</pre>';
So far this is my code:
<?php
$start = date("d/m/y", strtotime('today'));
$end = date("d/m/y", strtotime('tomorrow'));
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers/LineUpAdvanceSearch.aspx?text=&channel=506&genre=-1&ageRating=-1&publishYear=-1&productionCountry=-1&startDate=$start&endDate=$end&pageSize=1";
$data = file_get_contents($url, false, $context);
$re = '/LineUpId=(.+\d)/';
preg_match($re, $data, $matches);
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers//LineUpDetails.aspx?lcid=1037&luid=$matches[1]";
$data = file_get_contents($url, false, $context);
echo $data;
?>
I am trying to prepare a TV Guide for single channel and the current program,
Part of the HTML page:
<div class="GuideLineUpDetailsCenter">
<a class="LineUpbold">Name of the Show</a>
<br>
<div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
<br>
<div class="show">Information about the program</div>
<br>
<div class="LineUpbold">+14</div>
<br>
</div>
I want to pull the content and do something like this:
echo $LineUpbold;
echo $LineUpDetailsTime;
echo $show;
echo $LineUpbold;
Use a DOM parser and appropriate xpath queries instead:
<?php
$data = <<<DATA
<div class="GuideLineUpDetailsCenter">
<a class="LineUpbold">Name of the Show</a>
<br>
<div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
<br>
<div class="show">Information about the program</div>
<br>
<div class="LineUpbold">+14</div>
<br>
</div>
DATA;
# set up the dom
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
# set up the xpath
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//div[#class = 'GuideLineUpDetailsCenter']") as $container) {
$name = $xpath->query("a[#class = 'LineUpbold']/text()", $container)->item(0);
echo $name->nodeValue;
$details = $xpath->query("div[#class = 'LineUpDetailsTime']/text()", $container)->item(0);
echo $details->nodeValue;
# and so on...
}
The code loads your string, searches for divs with the class GuideLineUpDetailsCenter, loops over them and tries to find appropriate children within each div.
I want to get the browser details of the client. so that am using $_SERVER['HTTP_USER_AGENT'] to get the details but it get some extra information also like
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
in chrome browser. what i have done is convert the string into an array like
Array ( [0] => Mozilla/5.0 [1] => (Windows [2] => NT [3] => 10.0; [4]
=> Win64; [5] => x64) [6] => AppleWebKit/537.36 [7] => (KHTML, [8] => like [9] => Gecko) [10] => Chrome/60.0.3112.90 [11] => Safari/537.36 )
If i search for Chr -- i want it to search the array and return Chrome/60.0.3112.90 .
Please suggest solution for this thanks.
Code :
echo $browser = $_SERVER['HTTP_USER_AGENT'];
$strArray = explode(' ',$browser);
print_r($strArray);
The way you're converting the user-agent to an array is quite faulty, for example "Windows NT" becomes ['Windows','NT'] and this is not something you want.
You might want to use ua-parser to extract the user-agent information in a better way.
require_once 'vendor/autoload.php';
use UAParser\Parser;
$ua = "Mozilla/5.0 (Macintosh; Intel Ma...";
$parser = Parser::create();
$result = $parser->parse($ua);
print $result->ua->family; // Safari
print $result->ua->major; // 6
print $result->ua->minor; // 0
print $result->ua->patch; // 2
print $result->ua->toString(); // Safari 6.0.2
print $result->ua->toVersion(); // 6.0.2
print $result->os->family; // Mac OS X
print $result->os->major; // 10
print $result->os->minor; // 7
print $result->os->patch; // 5
print $result->os->patchMinor; // [null]
print $result->os->toString(); // Mac OS X 10.7.5
print $result->os->toVersion(); // 10.7.5
print $result->device->family; // Other
print $result->toString(); // Safari 6.0.2/Mac OS X 10.7.5
print $result->originalUserAgent; // Mozilla/5.0 (Macintosh; Intel Ma...
With the added need for searching for OS that is written in comments the accepted answer will not work well.
#Nabils answer will work quite well but since it splits the string in very small pieces it may be hard to use.
I thought I could use preg_split and create a good array to search and I think I made it.
I don't know all variations of user agents but they seem to follow a pattern.
This will split on space but also on ( and ).
$input_line = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36";
$arr = preg_split("/( \()|(\) )/", $input_line);
$arr2 =explode(" ", end($arr)); // explode "Chrome/60.0.3112.90 Safari/537.36" on space
Unset($arr[count($arr)-1]); // remove above exploded
$arr = Array_merge($arr,$arr2); // reinsert them as two items
//Var_dump($arr);
$search = "Chr";
Foreach($arr as $val){
If($search == Substr($val,0,3)) echo $val;
}
See here how it works: https://3v4l.org/qF65g
You can use regex on the string.
$str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36";
$search = "Chr";
Preg_match("/". $search . ".*?\/\d+\.\d+\.\d+\.\d+/", $str, $match);
Var_dump($match);
It will capture the search and any characters til / then your digits with dots.
https://3v4l.org/uYpoP
You can create function to search from array
echo $browser = $_SERVER['HTTP_USER_AGENT'];
$strArray = explode(' ',$browser);
print_r($strArray);
function search_from_array($array,$string)
{
$return_str = "";
foreach ($array as $key => $value) {
$pos = stripos($value, $string);
if($pos !== false)
$return_str=$value;
}
return $return_str;
}
echo search_from_array($strArray,"chr");