Get last available URL - php

I am using an API to get the users profile picture. The call looks something like this
https://familysearch.org/platform/tree/persons/{$rid}/portrait?access_token={$_SESSION['fs-session']}&default=https://eternalreminder.com/dev/graphics/{$default_gender}_default.svg
This link only works for about an hour because the user's session token expires then. I was wondering if there was any way to retrieve the last returned returned URL, which would be the direct link to the image, so I could store that in a database.
I have tried Google but I don't really know where to start.
Thanks in advance!

I was able to solve my own problem. It was doing a redirect to get the image and I just needed that URL. Here is my code that helped me get there.
$url="http://libero-news.it.feedsportal.com/c/34068/f/618095/s/2e34796f/l/0L0Sliberoquotidiano0Bit0Cnews0C12735670CI0Esaggi0Eper0Ele0Eriforme0Ecostituzionali0EChiaccherano0Ee0Eascoltano0Bhtml/story01.htm";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Must be set to true so that PHP follows any "Location:" header
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$a = curl_exec($ch); // $a will contain all headers
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); // This is what you need, it will return you the last effective URL
// Uncomment to see all headers
/*
echo "<pre>";
print_r($a);echo"<br>";
echo "</pre>";
*/
echo $url; // Voila

Related

how do php scripts for data mining from web pages work?

[Edited for better explanation and code included]
Hi! I have a php script on my web server that logs in to my heat pump web interface nibeuplink.com and gets all my temperature readings and so forth and returns them in a json-format.
freeboard.io is a free service for visualizing data, so I'm making a freeboard.io for my heat pump values. in freeboard.io I can add any json data as a data source, so I have added the link to my php-script. It fetches the data once but it seems there is some kind of cached values that it uses after that so they are not updated with new values from the script. freeboard.io uses a get-function to get the url. If i use a normal web browser to run the php script and refresh it, the values are updated - and also immediately updated in freeboard.io. Freeboard.io has a setting to automatically update the data source every 5 seconds.
It seems that there is something that triggers the script correctly when it is fetched from my web browser, but not when it is fetched from freeboard.io that uses a get function every 5 seconds to get new data.
in freeboard I can add headers to the get request, is there some header that would help me here to discard any cached data?
I hope that explains my problem better.
Is there anything i can add to my code in the beginning to always force an override of any cached data?
<?php
/*
* read nibe heatpump values from nibeuplink status web page and return them in json format.
* based on: https://www.symcon.de/forum/threads/25663-Heizung-Nibe-F750-Nibe-Uplink-auslesen-auswerten
* to get the code which is required as parameter, log into nibe uplink, open status page of your heatpump, and check url:
* https://www.nibeuplink.com/System/<code>/Status/Overview
*
* usage: nibe.php?email=<email>&password=<password>&code=<code>
*/
// to add additional debug output to the resulting page:
$debug = false;
date_default_timezone_set('Europe/Helsinki');
$date = time();
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.nibeuplink.com/LogIn";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile); // Stores cookies in the temp file
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
// Now you have the cookie, you can POST login values
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_POSTFIELDS, "Email=".$_GET['email']."&Password=".$_GET['password']);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile); // Uses cookies from the temp file
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Tells cURL to follow redirects
$output = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "https://www.nibeuplink.com/System/".$_GET['code']."/Status/ServiceInfo");
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_POST, 0);
$result = curl_exec($ch);
$pattern = '/<h3>(.*?)<\/h3>\s*<table[^>]*>.+?<tbody>(.+?)<\/tbody>\s*<\/table>/s';
if ($debug) echo "pattern: <xmp>".$pattern."</xmp><br>";
$pattern2 = '/<tr>\s*<td>(.+?)<span[^>]*>[^<]*<\/span>\s*<\/td>\s*<td>\s*<span[^>]*>([^<]*)<\/span>\s*<\/td>\s*<\/tr>/s';
if ($debug) echo "pattern2: <xmp>".$pattern2."</xmp><br>";
preg_match_all($pattern, $result, $matches);
// build json format from matches
echo '{';
$first = true;
foreach ($matches[1] as $i => $title) {
echo ($first ? '"' : ',"').trim($title).'":{';
$content = $matches[2][$i];
preg_match_all($pattern2, $content, $values);
$nestedFirst = true;
foreach ($values[1] as $j => $field) {
echo ($nestedFirst ? '"' : ',"').trim($field).'":"'.$values[2][$j].'"';
$nestedFirst = false;
}
echo "}";
$first = false;
}
echo ",\"time\":{\"Last fetch\":\"$date\"}";
echo "}";
if ($debug) {
echo "<pre><xmp>";
echo print_r($matches);
echo "<br><br>";
echo $result;
echo "</xmp></pre>";
}
?>
You can make an ajax call to php script to refresh the part of webpage. I don't understand what do you mean by io i.e. are you talking about fetching the data from database and if any changes occurred in database then only newly added records must be fetched. If you mean it in that sense then you can use cookie to track any new records added into database and only if it finds new records it can make ajax call to php script to run your algorithm on fetched total dataset.

how do I get my browser session from a get request

How do I use my current browser session and create a GET request in PHP based on that?
I tried the following:
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://localhost/x/");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//enable headers
curl_setopt($ch, CURLOPT_HEADER, 1);
//get only headers
curl_setopt($ch, CURLOPT_NOBODY, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
$headers=array();
$data=explode("\n",$output);
$headers['status']=$data[0];
array_shift($data);
foreach($data as $part){
$middle=explode(":",$part);
$headers[trim($middle[0])] = trim($middle[1]);
}
//print all headers as array
echo "<pre>";
print_r($headers);
echo "</pre>";
after checking, it seems that it is not really using my session to create the GET request, rather, it is creating an entire new GET request.
I also tried to pull the headers only of a get_headers() but that seemed to have the same issue for some reason. Is there a way to pull up my current session from a browser and use that in a GET request rather than having a new one every time? (I will be in the same machine so the session is mine). In short, I want to initiate a GET request using my current session.

cURL taking long time to get the final URL of redirect URL

Below code snippet is to get the final URL(which has media/zip/rar file) from redirect URL by using cURL. It gets the final URL, no doubt about it, but what it does is according to the size of file it varies in time to get URL.
Suppose file at final URL is 1MB, it will take around 5sec to retrieve. But if the file is about 35MB, it takes time about 150 sec. I think cURL is downloading result and finally fetching the URL from result.
<?php
echo get_rurl("x_url");//1.2MB -> 5-10sec
//echo get_rurl("y_url");//31.6MB -> 150sec
function get_rurl($url){
// initialize cURL
$curl = curl_init($url);
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
));
// execute the request
$result = curl_exec($curl);
// fail if the request was not successful
if ($result === false) {
curl_close($curl);
return null;
}
// extract the target url
$redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
return $redirectUrl;
}
?>
i cant use file_get_content() because i just want to get the final URL from given redirect URL.
So in short - how to get the final URL from redirect URL without downloading results.
Hope i make it clear. Any help will be appreciated.
This works fine with CURLINFO_EFFECTIVE_URL, but for it the option CURLOPT_FOLLOWLOCATION must set to TRUE. This is on the grounds that CURLINFO_EFFECTIVE_URL returns precisely what it says, the effective url that ends up getting loaded. If the CURLOPT_FOLLOWLOCATION=False then the effective url will be requested url, else it will be final url that is redirected to.
I did this using curl_getinfo. which gives me information regarding the last transfer
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
return $curl_info['redirect_url'];// extract final url
}
?>
or
Even you can use CURLINFO_REDIRECT_URL or CURLINFO_EFFECTIVE_URL depending upon your use cases. refer here
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
return curl_getinfo($ch, CURLINFO_REDIRECT_URL);
}
?>
Hope this helps to others users too.
According to the documentation of libcurl (https://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html), this is exactly as is expected when using CURLOPT_FOLLOWLOCATION => true,. You probably want to change this to false.

PHP cURL does not work when calling multiple URLs?

I am trying to take a list of URL's from a textbox, it has 1 URL per line and each URL does a redirect, I am trying to get the URL that it redirects to.
When I run this code below on a single URL, it returns the redirected URL which is what I want...
function getRedirect($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
$info = curl_getinfo($ch); //Some information on the fetch
curl_close($ch);
echo '<pre>';
print_r($info);
echo '</pre>';
}
$url = 'http://www.domain.com/go?a:aHR0cDovL2xldGl0Yml0Lm5ldC9kb3d';
getRedirect($url);
Now my problem is when I try to run it on multiple URL's with this code...
if(isset($_POST['urls'])){
$rawUrls = explode("\n", $_POST['urls']);
foreach ($rawUrls as $url) {
getRedirect($url);
}
}
When I run it on my list of URL's instead of giving me the redirected URL like my first example does correctly, it instead gives me the URL that I passed into cURL.
Can someone help me figure out why or how to fix this please?
It's already covered in the question comments but it seems the problem would be extra spacing at the end of the url.
Calling getRedirect(trim($url)) would fix it.
The space at the end is most likely turned into a querystring space (aka %20) and changes the value of query string parameters

Updating Twitter background via API

I'm having a little trouble updating backgrounds via Twitter's API.
$target_url = "http://www.google.com/logos/11th_birthday.gif";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Expect:'));
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$content = $to->OAuthRequest('http://twitter.com/account/update_profile_background_image.xml', array('profile_background_image_url' => $html), 'POST');
When I try to pull the raw data via cURL or file_get_contents, I get this...
Expectation Failed The expectation given in the Expect request-header
field could not be met by this server.
The client sent
Expect: 100-continue but we only allow the 100-continue expectation.
OK, you can't direct Twitter to a URL, it won't accept that. Looking around a bit I've found that the best way is to download the image to the local server and then pass that over to Twitter almost like a form upload.
Try the following code, and let me know what you get.
// The URL from an external (or internal) server we want to grab
$url = 'http://www.google.com/logos/11th_birthday.gif';
// We need to grab the file name of this, unless you want to create your own
$filename = basename($url);
// This is where we'll be saving our new file to. Replace LOCALPATH with the path you would like to save the file to, i.e. www/home/content/my_directory/
$newfilename = 'LOCALPATH' . $filename;
// Copy it over, PHP will handle the overheads.
copy($url, $newfilename);
// Now it's OAuth time... fingers crossed!
$content = $to->OAuthRequest('http://twitter.com/account/update_profile_background_image.xml', array('profile_background_image_url' => $newfilename), 'POST');
// Echo something so you know it went through
print "done";
Well, given the error message, it sounds like you should load the URL's contents yourself, and post the data directly. Have you tried that?

Categories