How to get Alexa audience geography using curl? [closed] - php

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I am trying to get the top 3 countries from Alexa report but I am unable to access the site using curl. But when I do I am getting an error from Alexa telling me to sign up with Amazon. I know curl is unblockable but they seem to have done it.
$url="http://www.alexa.com/siteinfo/google.com";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
echo('<textarea>'.$result.'</textarea>');

This should work. Note I used a standard set of curl options I like to use. Feel free to adjust based on your actual needs. The reason I did that is because while you are setting $agent you are not actually passing that to curl in any way. So my options properly sets CURLOPT_USERAGENT as well as a few other things.
$url ="http://www.alexa.com/siteinfo/google.com";
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSLVERSION, 3);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$result = curl_exec($ch);
curl_close($ch);
echo('<textarea>'.$result.'</textarea>');
And here are my results from my local test environment where I am using PHP 5.4 via MAMP on a Macintosh.
EDIT: According to the original poster, this script works on one host but not another where he is met with a “403: Forbidden” error. Which points to some kind of blocking happening on the Alexa server. I would recommend debugging by using curl -I from the command line like this:
curl -I http://www.alexa.com/siteinfo/google.com
And on my local Mac OS X 10.9.4 setup, I get this in response to the request:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Date: Thu, 10 Jul 2014 01:24:51 GMT
Server: Apache
Set-Cookie: rpt=%21; expires=Fri, 11-Jul-2014 02:24:51 GMT; domain=alexa.com
Set-Cookie: lv=1404955491; expires=Fri, 10-Jul-2015 07:24:51 GMT; path=/; domain=alexa.com
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
Connection: keep-alive
The HTTP/1.1 200 OK means all is good. If you run the same command from the command line & get anything other than that, you can bet you are being blocked. Possibly a block based just on an IP range. Or even blocked via something like ModSecurity which would do heuristic analysis of traffic to catch & block non-standard web requests. Regardless, if you are being blocked on the server side of this, there is not much you can do to unblock yourself.
That said, note how I properly set $agent in my version of the script but you didn’t? It could be in your testing you ran so many curl requests without a proper user agent while testing your IP is now temporarily blocked. So wait a day or two & try again but with my version of the script so a proper user agent is set. I bet it will work fine then.

It appears it is a simple coding error, try the following:
$url="http://www.alexa.com/siteinfo/google.com";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
echo '<textarea>'.$result.'</textarea>';
You missed the periods (.) after the '<textarea>' and before the '</textarea>' to make it include them, and the echo function doesn't require parenthesis.
I have tested this and it worked for me.

Related

how can i emulate a request like a web browser does?

When I am looking at
https://www.tutti.ch/de/vi/zaurich/haushalt/geraate-utensilien/tassen-und-unterteller-arv-ikea-blaue-streifen/27002681
with a browser, I see a complete other site than when I use:
file_get_contents(...) // or
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,...);
$result=curl_exec($ch);
var_dump($result);`
How can I get the html code like seen with the browser?
The html on this website is rendered in the client side by the browser using javascript. If you are trying to parse some content from the website, try using a headless browser. A headless browser is a browser that works without the graphical interface, but behaves like a normal browser. Both Chrome and Firefox have headless versions.
Here is a useful lib to query headless browsers with php: https://github.com/php-webdriver/php-webdriver
You can also interact with the javascript send commands like a real user would do.
You may install the browser and the driver in a different machine (or even your own pc) if you don't have the necessary permissions to do it in your hosting account.

Browser opens a link successfully, but not curl and file_get_contents

I'm trying to use Instagram API. when I open following link in browser, it's completely fine an you can click on it and see the json response:
https://www.instagram.com/nasa/?__a=1
When I tried to open the same url via file_get_contents() I faced 403 Forbidden Error.
So I tried to use curl. here is my code :
$url = "https://www.instagram.com/nasa/?__a=1";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
var_dump($result);
The problem is $result is an empty string. When I try to get contents using file_get_contents, I face 403 Forbidden Error, and when I try to get contents using curl it return an empty string.
Can Some body help? Tnx.
Edit
I dont get 403 Forbidden in my browser because I'm logged in.
you need to enable cookie support (eg CURLOPT_COOKIEFILE) AND log in before you can access https://www.instagram.com/nasa/?__a=1 , and your curl code never attempts to log in.
here you can see how to log in to Instagram with PHP: https://stackoverflow.com/a/41684531/1067003

PHP Curl return different results from URL in browser

I am using PHP Curl with this code:
curl_setopt($ch, CURLOPT_URL, 'https://www.segundamano.mx/anuncios/ciudad-de-mexico/alvaro-obregon/florida/renta-inmuebles/departamentos?precio=0-10000');
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0");
$uagent = 'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Firefox/22.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/36.0.1985.125 Chrome/36.0.1985.125 Safari/537.36';
curl_setopt($ch, CURLOPT_USERAGENT, $uagent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
My question is.. why PHP Curl gives different result Than Searching URL in BROWSER?
PHP Curls gives a big BODY CONTENT... with this LINE...
In Spanish... "No encontramos resultados para tu búsqueda..."
In English.....There are no results for your search...
What happen with this URL?
How Can I CURL and read by code this URL and get the REAL RESULTS AS BROWSER?
Help me please!
Thanks!!!
The link you have mentioned is a single-page web application or web site that interacts with the user by dynamically rewriting the current page rather than loading entire new pages from a server.
Also, this website is using vue js.
Please find the below links for more details.
https://en.wikipedia.org/wiki/Single-page_application
https://vuejs.org/
Because JavaScript is the root of all evil. the website gets the search results you want with AJAX after you've succesfully loaded the page. Just open the "network" tab of your browser inspection tool and see the requests flying around.
Fun part: the website does have a (seemingly authorized) API that it can talk too, maybe you can try that? https://webapi.segundamano.mx/nga/api/v1.1/public

Suddenly access denied cUrl in PHP

I used the following function to get access to an API (live working example)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.halteverbotszonen.com/api/numbers');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$output = curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
Since a few days (can't tell exactly when) it gives me a 403 error when executing the curl call. Accessing https://www.halteverbotszonen.com/api/numbers directly is possible. I have not changed anything on any of the two servers, what could possibly cause this and where could I see that (any logs for this?)
I have a second api where the same happens (accessible directly works, but not via curl call).
It's the same hoster, could they have changed something that does not allow incoming curl calls?
Any hint appreciated
- Maybe due to https/http ?
- Maybe a different conf inside your apache/php ?
- Maybe the distant server banned your IP
:o
In most of cases, when something works and PAF another day didn't work anymore, it's a software update problem (like conf file) or network problem (like IP) or distant problem (like the server). I guess :D

file_get_contents (and wget) very slow

I'm using the google text to speech api, but for some reason it's being really slow when I connect to it via php or command line.
I'm doing this:
$this->mp3data = file_get_contents("http://translate.google.com/translate_tts?tl=en&q={$text}");
Where $text is just a urlencoded string.
I've also tried doing it via wget on the command line:
wget http://translate.google.com/translate_tts?tl=en&q=test
Either way takes about 20 seconds or more. Via php it does eventually get the contents and add them to a new file on my server as I want it to. Via wget it times the connection out.
However, if I just go to that url in the browser, it's pretty much instant.
Could anyone shed any light on why this might be occuring?
Thanks.
It's due to how Google parses robots. You need to spoof the User-Agent headers to pretend to be a computer.
Some info on how to go about this would be here:
https://duckduckgo.com/?q=php%20curl%20spoof%20user%20agent
Managed to sort this out now, this is what I ended up doing and now it's only taking a few seconds:
$header=array("Content-Type: audio/mpeg");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $uri);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$this->mp3data = curl_exec($ch);
curl_close($ch);

Categories