PHP CURL - scrape seo urls when you only know the id - php

I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly - or something like that - and they look like this:
https://shopname.com/product-id-title-of-a-product.html
If i use the entire url it works and i'm able to get the data that i'm looking for but the only variable in that title that i know is the ID:
https://shopname.com/product-294
Is there a way to scrape that url in this case?
The url that only has the ID in it does REDIRECT to the full url.
And this is the code that i'm using:
$curl = curl_init();
$url = 'https://shopname.com/product-294';
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);

Curl provides the option CURLOPT_FOLLOWLOCATION.
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
The documentation states:
TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).
Therefore it would be advisable to set CURLOPT_MAXREDIRS aswell, for example to limit the execution to 1 redirection:
curl_setopt($curl, CURLOPT_MAXREDIRS, 1);
Like this you should be automatically be redirected to the original url without any further programming.

I think you need to capture the response headers in the curl object, that should contain the redirect url within them, and then you can parse that out and do a second curl request to get the url you are after.
Try using an app like postman or insomnia to assist you in this process.

Related

Parsing Redirect Response Codes From Redirect URL - PHP

Thank you for your time.
My purpose for posting this question was to see how I can approach the problem of parsing the authorization code I get from the redirect to use that code in a subsequent function to get the oauth bearer token.
What libraries can I use similar to python's request module and web module to achieve the same goal as the following code?
class ILToken(object):
def GET(self):
form = web.input(code=None, scope=None)
As an outcome, here is the following example. I have a redirect url set to "http://redirect_example.com/oauth/". When I click a button, the instant login link ("https://api.provider.com/oauth/authorize?response_type=code&redirect_uri=http://redirect_example.com/oauth/...") redirects me to "http://redirect_example.com/oauth/?code=ExampleAuthCode".
I've tried making use of the curl library in php with no positive progress using code similar to the following.
Code
# Initialize curl request parameters
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, env('INSTANT_LOGIN_URI'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, TRUE); // We'll parse redirect url from header.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
# Set variables and execute
$output = curl_exec($ch);
$response_code = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$redirect = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
curl_close($ch);
# Print Results
print $redirect;
Related articles say to set FOLLOWLOCATION to true and use the CURLINFO_EFFEFCTIVE_URL variable to return the code. This did not work for me. I am using php v7.4 on an ubuntu machine. The php app is using the laravel framework and loading behind an nginx v1.21 web server.
How can I get the destination URL using cURL?
PHP CURL redirect

Redirect to another page after CURL POST

I can't figure out how to redirect after CURL executing. I found something like
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
But I need something like
curl_setopt($ch, CURLOPT_AFTER_SUCCESS_GO_TO, "http://anotherpage.com");
Use CURLOPT_FOLLOWLOCATION if you want curl to automatically follow a "redirect" (which is a 3XX response and a Location: response header).
If you just want to fetch another URL after the first request succeeds, then just issue another one...

Header() substitute

Hi I am new to php and want to know some alternate function for the header('location:mysit.php');
I am in a scenario that I am sending the request like this:
header('Location: http://localhost/(some external site).php'&?var='test')
something like this but what I wanna do is that I want to send values of variables to the external site but I actually dont want that page to pop out.
I mean variables should be sent to some external site/page but on screen I want to be redirected to my login page. But seemingly I dont know any alternative please guide me. Thx.
You are searching for PHP cUrl:
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
Set the location header to the place you actually want to redirect the browser to and use something like cURL to make an HTTP request to the remote site.
The way you usually would do that is by sending those parameters by cURL, parse the return values and use them however you need.
By using cURL you can pass POST and GET variables to any URL.
Like so:
$ch = curl_init('http://example.org/?aVariable=theValue');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
Now, in $result you have the response from the URL passed to curl_init().
If you need to post data, the code needs a little more:
$ch = curl_init('http://example.org/page_to_post_to.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'variable1=value1&variable2=value2');
$result = curl_exec($ch);
curl_close($ch);
Again, the result from your POST reqeust is saved to $result.
You could connect to another URL in the background in numerous ways. There's cURL ( http://php.net/curl - already mentioned here in previous comments ), there's fopen ( http://php.net/manual/en/function.fopen.php ), there's fsockopen ( http://php.net/manual/en/function.fsockopen.php - little more advanced )

Right way to set the OAuth Authorization header?

I want to set a request header for a url xyz.com
is it the right way to set it in php?
header('Authorization: AuthSub token="xxxxxx"');
header('location:https://www.google.com/accounts/AuthSubRevokeToken');
I am trying to set the header for this URL for a call.But the Authorization: AuthSub header doesnt shows up in the request headers section of the FireFox NET panel.Which is used to show the requests.
Any idea about it?
Thanx.
I was using curl previously,But it didnt seemed to issue any request as i cant see it in the NET panel of FireFox.
Code is as follows:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"https://www.google.com/accounts/AuthSubRevokeToken");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Authorization: AuthSub token="1/xxx"'
));
$result = curl_exec($curl);
curl_close($curl);
echo 'hererer'.$result;exit;
header sets response headers, not request headers. (If you were trying to send a HTTP request elsewhere, it would have no effect.)
Please also note what the manual says about Remember that header() must be called before any actual output is sent, ....
And turn on error_reporting(E_ALL); before using header() to see if that is the issue for you.
Header names and values need to be separated by one colon plus a space, so the location "header" is just wrong, it should be:
header('Location: https://www.google.com/accounts/AuthSubRevokeToken');
(It's common to write the case this way, too, but not a need)
Next to that the header function is setting response headers, not request headers. So you're basically using the wrong tool.
In PHP you can not set request headers, that's part of the client (e.g. browser), not the server. So header just looks wrong here. Which HTTP client are you using?
A call, as in using CURL to request another page? The header() function applies only for web-browser<->server communications. It cannot affect any requests your server-side script does to other webservers. For that, you need to modify the particular method you're using, e.g. curl or streams.
For curl, see CURLOPT_HTTPHEADER here: http://php.net/curl_setopt

php to translate GET request to POST request

i have a hosted script somewhere that only accept POST request.
example, some.hosted/script.php
how can i setup another simple php that can accept GET request and then POST it to the hosted script.
so that i can put up a link like this: other.site/post2hostedscript.php?postthis=data
and then it POST postthis=data to the hosted script.
tnx
edit:
post2hostedscript.php do not give any result.
the result will go directly to some.hosted/script.php
just as if the user POST directly at the hosted script.
Your post2hostedscript.php will have to :
Fetch all parameters received as GET
Construct a POST query
Send it
And, probably, return the result of that POST request.
This can probably be done using curl, for instance ; something like this should get you started :
$queryString = $_SERVER['QUERY_STRING'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.othersite.com/post2hostedscript.php");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, $queryString);
curl_exec($ch);
curl_close($ch);
For a list of options that can be used with curl, you can take a look at the page of curl_setopt.
Here, you'll have to use, at least :
CURLOPT_POST : as you want to send a POST request, and not a GET
CURLOPT_RETURNTRANSFER : depending on whether you want curl_exec to return the result of the request, or to just output it.
CURLOPT_POSTFIELDS : The data that will be posted -- i.e. what you have in the query string of your incoming request.
And note that the response from the POST request might include some interesting HTTP header -- if needed, you'll have to fetch them (see the CURLOPT_HEADER option), and re-send the interesting ones in your own response (see the header function).
Take a look at the "curl" functions, they provide everything you need.
You might consider replacing all instances of $_POST in the old script to $_REQUEST, which will result in it accepting both GET and POST alike.

Categories