If I get title of the page, I can tell the download link is active or dead.
For example: "Free online storage" is title of dead link and "[file name]" is the title of active link (mediafire). But my page takes too long to respond, so is there any other way to check if a download link is active or dead?
That is what i have done:
<?php
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
?>
Do not perform a GET request, which downloads the whole page/file, but HEAD request, which gets only the HTTP headers, and check if the status is 200, and the content-type is not text/html
Something like this...
function url_validate($link)
{
#[url]http://www.example.com/determining-if-a-url-exists-with-curl/[/url]
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if(!$data)
{
return(false);
}
else
{
if($code==200)
{
return(true);
}
elseif($code==404)
{
return(false);
}
}
}
You can safely use any cURL library function. It is legitimate and thus would not regarded as a hacking attempt. The only requirement is that your web hosting company has cURL extension installed, which is very likely.
cURL should do the job. You can check the headers returned and the text content as well if you want.
Related
I'm trying to get all links at concrete web-page. I need only 'a' tags with concrete parameter. But firstly, as far as I know, I have to download the whole page.
I use this (mostly not mine) code:
<?php
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Устанавливаем параметр, чтобы curl возвращал данные, вместо того, чтобы выводить их в браузер.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$startUrl = 'address';
$data = file_get_contents_curl($startUrl);
echo($data);
?>
By this I'm getting error "Too many requests". The question is: can I change the amount of requests for finging elements of links array?
I think about curl_multi, but as far as I understand, it assumes that I already have the array and only need to make multiple threads.
Help, please.
I couldn't convert a double shortened URL to expanded URL successfully using the below function I got from here:
function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
preg_match('/Location: (.*)\n/', $response, $a);
if (!isset($a[1])) return $url;
return $a[1];
}
I got into trouble when the expanded URL I got was again a shortened URL, which has its expanded URL.
How do I get final expanded URL after it has run through both URL shortening services?
Since t.co uses HTML redirection through the use of JavaScript and/or a <meta> redirect we need to grab it's contents first. Then extract the bit.ly URL from it to perform a HTTP header request to get the final location. This method does not rely on cURL to be enabled on server and uses all native PHP5 functions:
Tested and working!
function large_url($url)
{
$data = file_get_contents($url); // t.co uses HTML redirection
$url = strtok(strstr($data, 'http://bit.ly/'), '"'); // grab bit.ly URL
stream_context_set_default(array('http' => array('method' => 'HEAD')));
$headers = get_headers($url, 1); // get HTTP headers
return (isset($headers['Location'])) // check if Location header set
? $headers['Location'] // return Location header value
: $url; // return bit.ly URL instead
}
// DEMO
$url = 'http://t.co/dd4b3kOz';
echo large_url($url);
Finally found a way to get the final url of a double shortened url. The best way is to use longurl api for it.
I am not sure if it is the correct way, but i am at last getting the output as the final url needed :)
Here's what i did:
<?php
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
echo expandUrlLongApi('http://t.co/dd4b3kOz');
?>
The output i get is:
"http://changeordie.therepublik.net/?p=371#proliferation"
The above code works.
The code that #cryptic shared is also correct ,but i could not get the result on my server (maybe because of some configuration issue).
If anyone thinks that it could be done by some other way, please feel free to share it.
Perhaps you should just use CURLOPT_FOLLOWLOCATION = true and then determine the final URL you were directed to.
In case the problem is not a Javascript redirect as in t.co or a <META http-equiv="refresh"..., this is reslolving stackexchange URLs like https://stackoverflow.com/q/62317 fine:
public function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
$cleanresponse= preg_replace('/[^A-Za-z0-9\- _,.:\n\/]/', '', $response);
preg_match('/Location: (.*)[\n\r]/', $cleanresponse, $a);
if (!isset($a[1])) return $url;
return parse_url($url, PHP_URL_SCHEME).'://'.parse_url($url, PHP_URL_HOST).$a[1];
}
It cleans the response of any special characters, that can occur in the curl output before cuttoing out the result URL (I ran into this problem on a php7.3 server)
I mine data from rss links and get a bunch of urls like:
http://feedproxy.google.com/~r/electricpig/~3/qoF8XbocUbE/
.... and if I access the links in my web browser, I am redirected to something like:
http://www.electricpig.co.uk/stuff.
Is there a way in php to write a function that, when given a url "a" that redirects the user to an url "b", returns you the url "b" ?
Here you go:
function getRedirect($oldUrl) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $oldUrl);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($ch);
$newUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
return $newUrl;
}
The function requires cURL, and makes use of CURLINO_EFFECTIVE_URL. You can look it up on phpdoc here
EDIT:
if you are certain the oldUrl is not redirecting to newUrl via javascript, then you can also avoid fetching the body of the newUrl using
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
Put the above line before $res = curl_exec($ch); in the function getRedirect to achiever faster execution.
public function getRedirect($url) {
$headers = get_headers($url, 1);
if (array_key_exists("Location", $headers)) {
$url = getRedirect($headers["Location"]);
}
return $url;
}
If you just enter the urls into the browser you can see that both work, cdon works even without javascript, have they blocked cURL somehow?
I'm trying to build a scraper to benifit legal movies online which would benifit them a whole lot, seems stupid blocking scrapers in general imho. Although I'm far from sure that's whats going on here! Might be just an error somewhere..
// Works
get_file1('http://sfanytime.com/sv-SE/Sokresultat/?field=all&q=The+Matrix', '/', 'sfanytime.html');
// Saves a blank 0 KB file
get_file1('http://downloads.cdon.com/index.phtml?action=search&search_terms=The+Matrix', '/', 'cdon.html');
function get_file1($file, $local_path, $newfilename) {
$out = fopen($newfilename, 'wb');
if ($out === FALSE) {
return false;
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, $file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$error = curl_error($ch);
if (strlen($error) > 0) {
echo "<br>Error is : ". $error;
return false;
}
curl_close($ch);
return true;
}
You should change the line
curl_setopt($ch, CURLOPT_FAILONERROR, true);
...to...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
CURLOPT_FAILONERROR will cause a "silent fail" - which from what you say, is not what you want. I have replaced this with CURLOPT_FOLLOWLOCATION, because when I visit the second URL, I get redirected to a "choose your country" type page, which will be a response with an empty body - which is why you get an empty file.
There is no problem with your code as such, simply a problem with the way you handle the response from the second URL. You don't see an error because, technically, there wasn't one.
I have a frontend code
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept: application/json'));
curl_setopt($ch, CURLOPT_URL, $url);
//make the request
$responseJSON = curl_exec($ch);
$response_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($response_status == 200) { // success
// remove any "problematic" characters from the json string and then decode
if (debug) {
echo "----finish API of getAPI inside basic_function with status==200---";
echo "<br>";
echo "-------the json response is-------" ; //.$responseJSON;
var_dump($responseJSON);
//print_r($responseJSON);
echo "<br>";
}
return json_decode( preg_replace( '/[\x00-\x1F\x80-\xFF]/', '', $responseJSON ) );
}
and I have a backend code which executed when cURL fired its operation with its URL. The backend code would therefore activated. So, I know cURL is operating.
$output=array (
'status'=>'OK',
'data'=>'12345'
)
$output=json_encode($output)
echo $output;
and $output shown on browser as {"status":"OK","data":"12345"}
However, I gone back to the frontend code and did echo $responseJSON, I got nothing. I thought the output of {"status":"OK","data":"12345"} would gone to the $responseJSON. any idea?
Here's output on Browser, something is very odd! the response_status got 200 which is success even before the parsing of API by the backend code. I expect status =200 and json response after the {"status":"OK","data":"12345"}
=========================================================================================
inside the get API of the basic functions
-------url of cURL is -----http://localhost/test/api/session/login/?device_duid=website&UserName=joe&Password=1234&Submit=Submit
----finish API of getAPI inside basic_function with status==200---
-------the json response is-------string(1153)
"************inside Backend API.php******************
---command of api is--/session/login/
---first element of api is--UserName=joe
--second element of api is---Password=1234
---third element of api is----Submit=Submit
----fourth element of api is---
-------inside session login of api-------------
{"status":"OK","data":"12345"}
Have you tried with curl_setopt($ch, CURLOPT_TIMEOUT, 10); commented?
See what happends if you comment that line.
Also try with the a basic code, if that works, smthing you added later is wrong:
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, false);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
Try var_dump($responseJSON)
If it returns false try
curl_error ( $ch )
Returns a clear text error message for the last cURL operation.
Are you sure your $url is correct?