How can I get HTML data from a site who use CloudFlare? - php

First at all, sorry for my bad English.
I'm trying to get the HTML code from https://www.uasd.edu.do/ but when I try to catch the code with the PHP function "file_get_contents()" or using cURL, it just simply doesn't work.
With "file_get_contents()" it returns with a 403 HTTP error. With cURL, it returns with a fictional captcha that just do not appear.
I tried sending Cookies with cURL, setting a user-agent, but I'm still on the same point. Also I tried to find the real IP address of the site, but with not success. Please help me! I'll really appreciate that.
The code:
$curl = curl_init();
if (!$curl) {
die("Is not working");
}
curl_setopt($curl, CURLOPT_URL, "https://uasd.edu.do/");
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_TIMEOUT, 50);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec($curl);
echo $html;
curl_close($curl);
The output:
Please enable cookies. One more step Please complete the security
check to access www.uasd.edu.do Why do I have to complete a CAPTCHA?
Completing the CAPTCHA proves you are a human and gives you temporary
access to the web property. What can I do to prevent this in the
future?
If you are on a personal connection, like at home, you can run an
anti-virus scan on your device to make sure it is not infected with
malware.
If you are at an office or shared network, you can ask the network
administrator to run a scan across the network looking for
misconfigured or infected devices.
Cloudflare Ray ID: 4fcbf50d18679f88 • Your IP: ... •
Performance & security by Cloudflare
Note: The "please enable cookies" appear using and not using cookies.

Related

cURL 35 error when connecting to http website

I couldn't find answer on this questions. Sometimes* while trying to retrieve data from http (NOT https) site I get 35 error - SSL connect error.
URL that I'm trying to reach is ie. http://www.aliexpress.com/item//32566080839.html. Then i get redirected to "full url": http://www.aliexpress.com/item/NEW-Sport-Headband-Bike-Halloween-Skull-face-mask-balaclava-Skull-Bandana-Paintball-Ski-Motorcycle-Helmet-Neck/32566080839.html
My cURL code:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://aliexpress.com/item//'. $id .'.html');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 3);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($curl);
I've been trying to add curl_setopt($curl, CURLOPT_SSLVERSION , 3); but it doesn't help.
Why http site gives a 35 error? Is it normal?
Is it possible that aliexpress i blocking my requests?
Sometimes I also get 28 error which is timeout reached - even with 10 seconds timeout.
*Sometimes - I mean it's working for a few hours then not working for about 10 minutes and then still working.
It looks like you are trying to spider on their site using the Id. And as a consequence the site blocks you. As you are referring to SSL error, it is very likely that during the blockade period they are redirecting you to an error page that starts with https://
For the debugging purpose you can enable the verbose mode and observe the header and you'll find what is inside the Location: response header.
curl_setopt ($curl, CURLOPT_VERBOSE, true);

Load cross-browser site with moblie emulator in iframe using php

i'm trying to check and display a cross-browser site in an iframe to emulate a mobile environment. The iframe should display the website in mobile format. I have an iframe that is only 320px in width, some mobile sites that is loaded and using css for responsive layout works as expected. But for webbsites that uses other techniques for detecting mobile devices do not load correctly. I would like to catch them all. My major problem is the orgin for the sites, they differ an different url are loaded on specifik actions. I'm not developing a emulator for that purpose, I need to load these the urls to check if they are currently fully responsive.
I saw this site:
http://php-drops.blogspot.se/2013/07/mobile-emulator-with-php.html
But cannot get the hang of it. How can I load the true responsive site in my iframe? I suppose when the header tells the environment to load a different site, like m.site.com. If there is a unique mobile site that redirects how can I get that url?
Got it working, this is what I did:
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3');
curl_setopt($ch, CURLOPT_URL, htmlspecialchars_decode($url));
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
return $info['url'];
And I got the mobile url back. So in case of somewebsite goes to m.somewebsite or any other and includes the correct layout within the iframe :)

Using CURL and PHPSimpleHTMLDOMParser gives me - 500 Internal Server error

I am using PHP Simple HTML DOM Parser, here you can check more about it: http://simplehtmldom.sourceforge.net/
And also i am using a CURL because this web adress http://www.sportsdirect.com is not loading on the normal examples from the SimpleHTMLDom.
So here is the code i use:
<?php
include_once('../simple_html_dom.php');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.sportsdirect.com/');
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
$str = curl_exec($curl);
curl_close($curl);
$html= str_get_html($str);
echo $html->plaintext;
?>
When i try to load the script it gives me: 500 Internal Server Error
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster#superweb.bg and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
This script is just not working for this web adress, because when i try to load other website like mandmdirectDOTcom it is woking OKEY!
Where is my mistake and how i can make this thing works?
Try this for the curl fetch. It works for me in this case. This is a standard set of curl options & settings I use that work well:
include_once('simple_html_dom.php');
$url = "http://www.sportsdirect.com";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$str = curl_exec($curl);
curl_close($curl);
$html = str_get_html($str);
echo $html->plaintext;
I believe the issue with your original curl settings was the missing user agent. Try the same script with the CURLOPT_USERAGENT line commented out to see what I mean.
Many servers have firewall settings that disallow curl requests from users making requests without a proper user agent setting. The user agent I have set here is a fairly generic Firefox user agent, so feel free to experiment with that to use something else.
Try setting a Host header in the request. It's possible that the target domain is on a shared server, and without a Host header, the server doesn't know what to do.
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Host: www.sportsdirect.com'));

checking US-only website status using curl

[Problem]
There is a website which works for US-citizens only (shows info "A" for US-citizens, info "B" for non-US citizens). I need to constantly monitor this webpage for changes ("A" info) - an email should be sent when something is changed! How do I do it? The problem is that I live in Europe!
[Already accomplished]
I have a linux server, daemon and curl PHP script which accomplishes the following task! It works great for all "non-US-only" websites.
[Question]
One way to solve the problem might be to rent a US server but that's not acceptable at all and it is going to cost a lot! I believe that another way to solve the problem might be - to use a US VPN on my server, but for some reasons I won't do that. Is there a way to run curl through proxy maybe? Any ideas?
Current code is the following:
function getrequest($url_site/*,$post_data*/) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url_site);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE_FILE); // Cookie management.
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE_FILE);
$result = curl_exec($ch); // run the whole process
curl_close($ch);
return $result;
}
and
$sleep_time = 1;
$login_wp_url = "http://www.mysite.com";
set_time_limit(60*10);
$result = getrequest($login_wp_url);
How do I grab contents from US-only website?
P.S. to get the idea of what I mean - try visiting the Hulu from Europe countries.
P.P.S. that's not a Hulu, not a homework.
Many cloud service providers, e.g. Heroku and Amazon, offer their smallest instances for free. You could simply set up one of these for free, make sure that you are provisioned on an US-located server and run your script there.
Another possibility would be to use a (free) proxy for these requests. Here is a list of free proxie servers: http://www.xroxy.com/proxy-country-US.htm.
curl_setopt($ch, CURLOPT_PROXY, "http://160.76.xxx.xxx:8080");
curl_setopt($ch, CURLOPT_PROXYPORT, 8080);
curl_setopt ($ch, CURLOPT_PROXYUSERPWD, "xxx:xxx");

CURL Request crashes SSL

I've noticed a little problem with CURL in PHP. Whenever I request a https:// connection it returns "false", and every website that I try to reach while I have my PHP page open reports to have an Untrusted certificate.
This is my request method:
private function request($url, $params, $method = "GET") {
if ($method == "GET")
$url = $this->structGET($url, $params);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if (isset($_SERVER['HTTP_USER_AGENT'])) {
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
} else {
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.X.Y.Z Safari/525.13.');
}
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$header[] = 'Accept-Language: EN';
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
if ($method == "POST") {
curl_setopt($ch, CURLOPT_POST, true);
if ($params)
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
}
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
And this is what Chrome returns when I try visiting Facebook.
The site's security certificate is not trusted!
You attempted to reach
www.facebook.com, but the server presented a certificate issued by an
entity that is not trusted by your computer's operating system. This
may mean that the server has generated its own security credentials,
which Google Chrome cannot rely on for identity information, or an
attacker may be trying to intercept your communications. You cannot
proceed because the website operator has requested heightened security
for this domain.
Don't use techniques that disable certificate verification. While they may "solve" your problem on the surface, they only ignore the problem, rather than fixing it. Never do this in production code.
The most likely cause is that you're on a network where there is a MITM corporate proxy. However legitimate these devices may be, they are effectively MITM devices.
What they do is that they will replace the original certificate with a certificate issued using their own internal CA, so as to be able to monitor the traffic.
If this device was legitimately set up by your network administrator, you should be able to get its CA certificate (in those circumstances, the CA certificate would typically be installed on all end-user machines centrally administered).
It's quite likely that, as a developer, you may have installed your own machine, and might not have the CA certificate installed. Ask your network administrator for that CA certificate, and install it with the certificates used by your browser and by curl within PHP (two different locations). Where the curl default location is may depend on the system you're using, but you can also configure it via CURLOPT_CAINFO.
Presumably, you're developing within a local network, but may possibly deploy that service on a different network when it's done. Make sure that this is configurable.
Yeah that happens when cURL tries to see if the SSL is verified. Facebook usually has a verified signature but may be because of network, it is returning invalid (happens in my case: using fortiguard proxy, facebook blocked!)
So, what you can do is, you can choose to ignore that error totally.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
That should fix it. But, if you want to fix it properly, then you should probably use a proxy or something or get a certificate for the server(if it is yours).
You need to add these two lines for SSL
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);

Categories