How do I open the content without changing the domain? - php

How do I open the content without changing the domain?
example:
User open domain1.com/test.php but to take the content domain2.com/test.php ?
htaccess or php script?

It depends what you're looking to do:
If you want to permanently have calls to domain1.com/test.php redirect to domain2.com/test.php, you can set an A record in the DNS settings at your hosting provider. Be sure to use a 301 (not 302) redirect type.
If you simply want to have domain1.com/test.php return the payload from domain2.com/test.php, have domain1.com/test.php call domain2.com/test.php like this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://domain2.com/test.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true)
$output = curl_exec($ch);
curl_close($ch);
echo "<pre>$output</pre>";

Related

Detect URL redirect path php

I want to get full redirect path of the url.
Let's say if source.com redirects to destination.com after multiple redirects like this:
http://www.source.com/ -> http://www.b.com/ -> http://www.c.com/ -> http://www.destination.com/
how do I get all redirected URL's?
using this below code I am getting only http://www.destination.com/ how do I detect full url redirect chain?
<?php
$url='windows.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow the redirects
curl_setopt($ch, CURLOPT_HEADER, false); // no needs to pass the headers to the data stream
curl_setopt($ch, CURLOPT_NOBODY, true); // get the resource without a body
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // accept any server certificate
curl_exec($ch);
// get the last used URL
$lastUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
echo $lastUrl;
?>
This code has another problem it can't detect redirected url of youtube redirects.
Tested URL : https://www.youtube.com/redirect?redir_token=QUFFLUhqbkVxUFZUME9NbWF4RThxdFpGV3pmTTJEdFVWQXxBQ3Jtc0tubGJqU016TzJ6WnlfeUItX0ZmOUItUE1jRlZoZXhxMzNpQllpM0NLSk4ycnBLMGNidTFsX3N6WkU2X3RsUTRZb1lXQVp5SEZjbnU3eDFuZS1VU3dhdzg2QW9ZMTl1azFCZFZHcHRLdFF3dTM1MlRWdw%3D%3D&event=video_description&v=KEa2XWRGf_4&q=https%3A%2F%2Fwww.facebook.com%2Fabhiandniyu
My question is how do I detect full url redirect chain for all types of redirect requests.
You're probably missing:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
add it to your CURL config and it should work then.
Don't follow HTTP redirects: curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
And output HTTP headers, while testing: curl_setopt($ch, CURLOPT_HEADER, true);
Then you can obtain the Location header from the received HTTP 302 response.
When it's more than one redirect, this would have to run in a loop, until HTTP 200 has been received. In this context HTTP 200 means, that the final destination has been reached.

file_get_contents request to external site

I am trying to do a file_get_contents of this Demo URL
However the server has trouble with getting data from external sites. This is the error I get if I echo the file_get_contents:
Found The document has moved
here. Apache/2.4 Server at spotifycharts.com Port 80
I have turned the register_global on in the php.ini file, but this doesn't help.
What would be the most logical thing to check to make sure my website is able to get data from external sites?
Just use the https url instead of the http url:
https://spotifycharts.com/api/?type=regional&country=nl&recurrence=daily&date=latest&limit=200
You may need to request with cURL, I don't think file_get_contents() can follow 302 redirects.
Something like this should work...
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if (preg_match('#Location: (.*)#', $a, $r))
$l = trim($r[1]);
How to get the real URL after file_get_contents if redirection happens?
Source

How avoid Moved Permanently The document has moved here

I'm in a site and I would call an API that is in another site. So I build a curl
$url = ........
$curl_data = array('name'=>$name);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curl_data);
$output = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_HTTP_CODE);
so when I execute the curl I print the value "$output" and I obtain Moved Permanently The document has moved here. This is wrong because I would call this api I would obtain value and come back to the page when the process started. Anyone can help me?
After a day I resolve add this line before to call the function:
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
Check the URL if it is indeed throwing error 301 (move permanently). Use fiddler since it can capture HTTP error codes.
see this PHP cURL says Moved Permanently when POSTing to a virtual host
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($curl, CURLOPT_POSTREDIR, 3);
This work for me by adding this line
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
I have faced the same issue, You can fix it by changing your url prefix from http to https (if it was http otherwise change it to http)
The problem is in .htaccess
Something like
RewriteRule ............ [R=301,L]

How can I properly follow all redirects on sites I am trying to scrape with cURL in PHP?

I am using cURL to try to scrape an ASP site that is not on my server, with the following option to automatically follow redirects it comes across:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
but it is not properly following all redirects that the website sends back: it is putting several of the redirect URLs as relative to my server and my PHP script's path, instead of the website's server and the path that the website's pages should be relative to. Is there any way to set the base path or server path in cURL, so my script can properly follow the relative redirects it comes across when scraping through the other website?
For example: If I authenticate on their site and then try to access "https://www.theirserver.com/theirapp/mainForm/securepage.aspx" with my script at "https://www.myserver.com/php/myscript.php", then, under some circumstances, their website tries to redirect back to their login page, but this causes a big problem, because the redirect sends my cURL client to "https://www.myserver.com/php/mainForm/login.aspx", that is, '/mainForm/login.aspx' relative to my script on my server, instead of the correct "https://www.theirserver.com/theirapp/mainForm/login.aspx" relative to the site I am scraping on their server.
I would expect cURL's FOLLOWLOCATION option to properly follow relative redirects based on the "Location:" header of the web pages I am accessing, but it seems that it doesn't and can't. Since this seems to not work, preferably I want a way to tell cURL a base path for the server or for all relative redirects it sees, so I can just use FOLLOWLOCATION. If not, then I need to figure out some code that will do the same thing FOLLOWLOCATION does, but that can let me specify a base path to handle these relative URLs when it comes across them.
I see several similar questions about following relative paths with cURL, but none of the answers have any good suggestions for dealing with this problem, where I don't own the website's server and I don't know every single redirect that might come up. In fact, none of the answers I've seen for similar questions seem to even understand that a person might be trying to scrape an external website and would want any relative redirects they come across while scraping the site to just be relative to that site.
EDIT: Here is the code in question:
$urlLogin = "https://www.theirsite.com/theirApp/MainForm/login.aspx"
$urlSecuredPage = "https://www.theirsite.com/theirApp/ContentPages/content.aspx"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; yie8)");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
// GET login page
$data=curl_exec($ch);
// Read ASP viewstate and eventvalidation fields
$viewstate = parseExtract($data,$regexViewstate, 1);
$eventval = parseExtract($data, $regexEventVal, 1);
//set POST data
$postData = '__EVENTTARGET='.$eventtarget
.'&__EVENTARGUMENT='.$eventargument
.'&__VIEWSTATE='.$viewstate
.'&__EVENTVALIDATION='.$eventval
.'&'.$nameUsername.'='.$valUsername
.'&'.$namePassword.'='.$valPassword
.'&'.$nameLoginBtn.'='.$valLoginBtn;
// POST authentication
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
$data = curl_exec($ch);
/******************************************************************
GET secure page (This is where a redirect fails... when getting
the secure page, it redirects to /mainForm/login.aspx relative to my
script, instead of /mainForm/login.aspx on their site.
*****************************************************************/
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $urlSecuredPage);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
$data = curl_exec($ch);
echo $data; // Page Not Found
You may be running into redirects that are JavaScript redirects.
To find out what is there:
This will give you additional info.
curl_setopt($ch, CURLOPT_FILETIME, true);
You should set fail on error:
curl_setopt($ch, CURLOPT_FAILONERROR,true);
You may also need to see all the Request and Response headers:
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
The big thing you are missing is curl_getinfo($ch);
It has info on all the redirects and the headers.
You may want to turn off: CURLOPT_FOLLOWLOCATION
And do each request individually. You can get the redirect location from curl_getinfo("redirect_url")
Or you can set CURLOPT_MAXREDIRS to the number of successful redirects, then do a separate curl request for the problem redirect location
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
When you get the response, if no curl error, get the resposne header
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
echo $data;
}
else {
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$responseHeader = substr($data,0,$skip);
$data= substr($data,$skip);
$info = curl_getinfo($ch);
$info = var_export($info,true);
}
echo $responseHeader . $info . $data;
A better way to web scraping a webpage is to use 2 PHP Packages = Guzzle + DomCrawler.
I made a lot of tests with this combination and i came to the conclusion that this is the best choice.
Here, you will find an example for your implementation.
Let me know if you have any problem! ;)

PHP Header redirection problem within page called by cURL

I have a site that uses cURL to access some pages, stores the returned results in variables, and then uses these variables within its own page. The script works well except where the target cURL page has a header('Location: ...') command inside it. It seems to just ignore this header command.
The cURL command is as follows...
//Load result page into variable so portions can be allocated to correct variables
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); # URL to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1 ); # return into a variable
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$loaded_result = curl_exec( $ch ); # run!
curl_close($ch);
I've tried changing the CURLOPT_HEADER to 1 but it doesn't do anything.
So how can I allow script redirection within the target urls using cURL to grab the results? By the way, the pages work fine if accessed other than via cURL but iFrames are not an option in this instance.
If you want cURL to follow redirections add this:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
You'll want the options CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS. See the manual.
try
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

Categories