Receiving url from the curl source

Receiving url from the curl source - php

I have a simple question to ask. I am using CURL to send data using post fields to my destination server. I am using the following code,
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($_POST));
curl_setopt($ch,CURLOPT_POSTFIELDS, $postStr);
$result = curl_exec($ch);
curl_close($ch);
And then i am capturing the data using
$post = $_POST;
So is there anyway i can determine from which server the curl data is coming from? Since its coming from 5 different servers. I want to capture the URL of the source from which postfields are being sent. Thanks

Use the verbose mode (-v) on the cURL program.
Example:
C:\curl www.google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
Or:
C:\curl -v www.google.com
* Rebuilt URL to: www.google.com/
* Trying 216.58.222.4... <<< --- <<< --- YOUR SERVER IS THIS ONE
* Connected to www.google.com (216.58.222.4) port 80 (#0)
> GET / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/7.42.0
> Accept: * / * <
< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=UTF-8
< Location: http://www.google.com.br/?gfe_rd=cr&ei=A76WVZvWCO-p8wf334HIDg
< Content-Length: 262
< Date: Fri, 03 Jul 2015 16:53:23 GMT
< Server: GFE/2.0
< Alternate-Protocol: 80:quic,p=0
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
* Connection #0 to host www.google.com left intact

Related

ipinfo.io via PHP's file_get_contents() and cURL fail

I try to get a GeoLocation data from http://ipinfo.io,
Here is my way :
$resp = file_get_contents('http://ipinfo.io/json');
$data = json_decode($resp);
It return an error :
Warning: file_get_contents(http://ipinfo.io/json): failed to open stream: Permission denied in ....
But then I access the link (http://ipinfo.io/json) manually in the URL box of my browser, it shows a correct json.
I also try it with cURL :
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, "ipinfo.io/json");
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$resp = curl_exec($curlSession);
if (FALSE === $resp) {
echo curl_errno($curlSession);
}
curl_close($curlSession);
It echo a number of 7, and i look up in the internet, error 7 means Couldn't connect to the server.
Any idea why ?
Thank you

I run http://ipinfo.io, and we don't block access to any IPs (we do rate limit requests from IPs, but that'd result in a HTTP status code, not a blocked connection). This sounds like a config issue with your server to me. Some hosts lock down file_get_contents so it can't open URLs, or might have blocked http://ipinfo.io. Are few ways to track this down:
1) Can you open another URL with file_get_contents? Eg. what happens when you file_get_contents('http://google.com'). If you get a permission denied error there then you should speak to your hosting provider
2) Does command line curl work for ipinfo.io? The -i -v flags should give you more information about what's going on here. Here's what a successful request looks like:
$ curl -iv ipinfo.io
* Rebuilt URL to: ipinfo.io/
* Trying 54.68.119.255...
* Connected to ipinfo.io (54.68.119.255) port 80 (#0)
> GET / HTTP/1.1
> Host: ipinfo.io
> User-Agent: curl/7.49.1
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
Content-Type: application/json; charset=utf-8
< Date: Sun, 15 Jan 2017 18:38:44 GMT
Date: Sun, 15 Jan 2017 18:38:44 GMT
< Server: nginx/1.8.1
Server: nginx/1.8.1
< Set-Cookie: first_referrer=; Path=/
Set-Cookie: first_referrer=; Path=/
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< Content-Length: 252
Content-Length: 252
< Connection: keep-alive
Connection: keep-alive
<
{
"ip": "24.6.61.239",
"hostname": "c-24-6-61-239.hsd1.ca.comcast.net",
"city": "Mountain View",
"region": "California",
"country": "US",
"loc": "37.3845,-122.0881",
"org": "AS7922 Comcast Cable Communications, LLC",
"postal": "94040"
* Connection #0 to host ipinfo.io left intact
}

Quite often a server will be configured to prevent requests where there is no User-Agent string present in the request headers so you can add a context argument to file_get_contents that supplies a User-Agent and any other headers you need.
$args=array(
'http'=>array(
'method' => "GET",
'header' => implode( "\n", array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Host: ipinfo.io'
)
)
)
);
/* create the context */
$context=stream_context_create( $args );
$resp = file_get_contents( 'http://ipinfo.io/json', FILE_TEXT, $context );
$data = json_decode( $resp );
echo '<pre>',print_r( $data,true ),'</pre>';

i see 2 plausible explanations here.
1: you use a shitty DNS server. try GoogleDNS (8.8.8.8) instead.
curl_setopt($curlSession,CURLOPT_DNS_LOCAL_IP4,'8.8.8.8');
if that fixes it, contact your DNS provider and sort it out with them
2: you're IP banned. try to just create a TCP socket to their ip, see if you can do that.
<?php
$sock=socket_create(AF_INET,SOCK_STREAM,SOL_TCP);
var_dump($sock,socket_connect($sock,gethostbyname('ipinfo.io'),80),socket_close($sock));
if you can't do that, you're probably IP banned

Authenticating to webservice with Kerberos from PHP in IIS

I am writing a PHP webapplication that has to connect to a webservice using Kerberos 5 authentication (Active Directory). My PHP website is hosted on IIS 7.5 with PHP 5.5. The application pool is running under the account that is authorized in Active Directory and for the target webservice.
I tried every example code that I could find on this site and other sites but to no avail.
This is the PHP code I am using now:
$url = 'http://mywebservice/login/kerberos';
$ch = curl_init();
$options = [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_VERBOSE => true,
CURLOPT_HTTPAUTH => CURLAUTH_GSSNEGOTIATE,
CURLOPT_HTTPHEADER => ['Authorization: Negotiate'],
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERPWD => 'myuser',
CURLOPT_URL => $url,
CURLOPT_HEADER => 1
];
curl_setopt_array( $ch, $options);
$result = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$header = substr($result, 0, $header_size);
$body = substr($result, $header_size);
print $result;
This gives me the following message:
HTTP/1.1 302 Found Date: Fri, 21 Oct 2016 14:49:15 GMT X-Robots-Tag: noindex,nofollow WWW-Authenticate: Location: http://mywebservice/login?login_fail Content-Length: 0
When I remove the CURLOPT_HTTPHEADER => ['Authorization: Negotiate'] l get an Internal Server error from the curl module.
When I use curl commandLine I get the following result:
curl --negotiate http://mywebservice/login/kerberos -umyuser#mydomain --verbose -c "c:\cookie.txt" -b "c:\cookie.txt"
Enter host password for user 'myuser#mydomain':
* Trying (192.168.1.1...
* Connected to mywebservice (192.168.1.1) port 80 (#0)
> GET /login/kerberos HTTP/1.1
> User-Agent: curl/7.41.0
> Host: mywebservice
> Accept: */*
> Cookie: JSESSIONID_PUBLIC=X(MASKED)XXXXXXXXXXXXXXXXX
>
< HTTP/1.1 401 Unauthorized
< Date: Fri, 21 Oct 2016 14:52:46 GMT
< X-Robots-Tag: noindex,nofollow
< WWW-Authenticate: Negotiate
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Thu, 20 Oct 2016 14:52:46 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< P3P: CP=CAO PSA OUR
< Content-Type: text/html; charset=UTF-8
< Content-Length: 3643
<
* Ignoring the response-body
* Connection #0 to host mywebservice left intact
* Issue another request to this URL: 'http://mywebservice/login/kerberos'
* Found bundle for host mywebservice: 0xXXXXXXX
* Re-using existing connection! (#0) with host mywebservice
* Connected to mywebservice (192.168.1.1) port 80 (#0)
* Server auth using Negotiate with user 'myuser#mydomain'
> GET /login/kerberos HTTP/1.1
> Authorization: Negotiate X(MASKED)XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXD
w==
> User-Agent: curl/7.41.0
> Host: mywebservice
> Accept: */*
> Cookie: JSESSIONID_PUBLIC=X(MASKED)XXXXXXXXXXXXXXXXX
>
< HTTP/1.1 302 Found
< Date: Fri, 21 Oct 2016 14:52:46 GMT
< X-Robots-Tag: noindex,nofollow
< WWW-Authenticate:
< Location: http://mywebservice/?login_fail
< Content-Length: 0
<
* Connection #0 to host mywebservice left intact
When I test with the KerberosAuthenticationTester tool (http://blog.michelbarneveld.nl/michel/archive/2009/12/05/kerberos-authentication-tester.aspx) it authenticates me right away when I pass the url and credentials.
I assume that it is not working because I am missing the krb5 library. I could not find it as a DLL so I tried recompiling it with the PHP source in Visual Studio. This is not working for me as well, I am missing the config.w32 file. If necessary I can elaborate on that but first I want to know if this is really needed.
I also installed MIT Kerberos but this did not help aswell.
Is it correct that I need the krb5 DLL, or am I on the wrong track? If I need this DLL, where can I get it or how can I compile it? If there is another solution I would be very happy to hear it.
Thanks everyone for taking your time for me and replying!

xml download - blocked

I am trying to download xml file from one polish website. For first days it worked but then I could download this file to my server (but I can open and download it on my computer). In file on my server in which there should be xml content is html content telling me that I have been blocked.
I was trying to contact with webmaster from website from which I want to get xml and he told me that I am not blocked by IP address. So the question is what I should sent in headers or what to download this file?
My code to download xml file is below and here is the xml which I want to download: http://www.polskatimes.pl/rss/fakty_kraj.xml
$headers[] = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
$headers[] = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$headers[] = "Accept-Language:pl-PL,pl;q=0.8";
$headers[] = "Accept-Encoding:gzip,deflate,sdch";
$headers[] = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$headers[] = "Keep-Alive:115";
$headers[] = "Connection:keep-alive";
$headers[] = "Cache-Control:max-age=0";
$xml_data = file_get_contents($xml,false,stream_context_create(
array("http" => array('header' => $headers)))); // your file is in the string "$xml" now.
file_put_contents($xml_md5, $xml_data); // now your xml file is saved.
Request the URL in verbose mode (-v):
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connectede to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
> Host: www.polskatimes.pl
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Thu, 18 Apr 2013 10:40:15 GMT
< Content-Type: text/html; charset=utf8
< Transfer-Encoding: chunked
< Connection: close
< Vary: Accept-Encoding
< Expires: Thu, 18 Apr 2013 10:40:15 GMT
< Cache-Control: max-age=0
(html page with message that I am temporary blocked)
* Closing connection #0

To inspect what happens behind the scene (and which headers you actually need or not) you need to analyze a little. That is nothing magic, you can do it on the commandline with a software called curl. It is available for many (even all?) computer platforms.
First step most often is to request the URL in verbose mode (-v):
$ curl -v http://www.polskatimes.pl/rss/fakty_kraj.xml
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connected to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 17 Apr 2013 17:39:51 GMT
< Server: Apache
< Set-Cookie: sprawdz_cookie=1; expires=Thu, 17-Apr-2014 17:39:51 GMT
< Location: http://www.polskatimes.pl/rss/fakty_kraj.xml?cookie=1
< Vary: Accept-Encoding
< Content-Length: 0
< Connection: close
< Content-Type: text/html; charset=iso-8859-2
<
* Closing connection #0
That shows you the request (prefixed with >) and response (prefixed with <) headers and the response body (empty in this case). As you can see the status is 302 Found which means as 3xx a redirect and the location header tells where to:
Location: http://www.polskatimes.pl/rss/fakty_kraj.xml?cookie=1
As the query parameter suggests, this is a cookie-check. The cookie itself is set as well:
Set-Cookie: sprawdz_cookie=1; expires=Thu, 17-Apr-2014 17:39:51 GMT
So in the next step we will replay the last command but this time setting the cookie which can be done with the -b argument:
$ curl -v -b prawdz_cookie=1 http://www.polskatimes.pl/rss/fakty_kraj.xml
* About to connect() to www.polskatimes.pl port 80 (#0)
* Trying 195.8.99.38... connected
* Connected to www.polskatimes.pl (195.8.99.38) port 80 (#0)
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
> Cookie: prawdz_cookie=1
>
< HTTP/1.1 200 OK
< Date: Wed, 17 Apr 2013 17:43:52 GMT
< Server: Apache
< Set-Cookie: sesja_gratka=e38fa0eb93705c8de7ae906198494439; expires=Wed, 24-Apr-2013 17:43:52 GMT; path=/; domain=polskatimes.pl
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Vary: Accept-Encoding
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/xml; charset=utf-8
<
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title><![CDATA[Fakty - Kraj]]></title>
<link>http://www.polskatimes.pl/fakty/kraj/</link>
<atom:link href="http://www.polskatimes.pl/rss/fakty_kraj.xml" rel="self" type="application/rss+xml"/>
<description><![CDATA[Materia┼éy z dzia┼éu Kraj]]></description>
... (cutted)
So this is immediately successful. And now the real good part: You know that you need to set the cookie for the request and curl shows you already all headers it used:
> GET /rss/fakty_kraj.xml HTTP/1.1
> User-Agent: curl/7.21.1 (i686-pc-mingw32) libcurl/7.21.1 OpenSSL/0.9.8r zlib/1.2.3
> Host: www.polskatimes.pl
> Accept: */*
> Cookie: prawdz_cookie=1
Most of them you do not need to care about with file_get_contents, the first line as well as the Host: and the Accept: line.
The User-Agent: header does not look like it really plays a role as curl is accepted.
So all what is left is the Cookie: header. Let's try in PHP:
$ php -r "echo file_get_contents('http://www.polskatimes.pl/rss/fakty_kraj.xml', null,
stream_context_create(['http'=>['header'=>['Cookie: prawdz_cookie=1']]]));"
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title><![CDATA[Fakty - Kraj]]></title>
<link>http://www.polskatimes.pl/fakty/kraj/</link>
<atom:link href="http://www.polskatimes.pl/rss/fakty_kraj.xml" rel="self"
type="application/rss+xml"/>
... (cutted)
And this is the direct test that only the Set-Cookie: prawdz_cookie=1 header is needed.

PHP: Validating URL with Curl

$fileSource = "http://google.com";
$ch = curl_init($fileSource);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($retcode != 200) {
$error .= "The source specified is not a valid URL.";
}
curl_close($ch);
Here's my issue. When I use the above and set $fileSource = "http://google.com"; it does not work, whereas if I set it to $fileSource = "http://www.google.com/"; it works.
What's the issue?

One permanently redirects (301) to the www. domain, while the other one just replies OK (200).
Why are you only considering only the 200 status code as valid? Let CURL handle that for you:
curl_setopt($ch, CURLOPT_FAILONERROR, true);
From the manual:
TRUE to fail silently if the HTTP code returned is greater than or
equal to 400. The default behavior is to return the page normally,
ignoring the code.

Try explicitly telling curl to follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
If that doesn't work you may need to spoof a user agent on some sites.
Also, if they are using JS redirects your out of luck.

What you're seeing is actually a result of a 301 redirect. Here's what I got back using a verbose curl from the command line
curl -vvvvvv http://google.com
* About to connect() to google.com port 80 (#0)
* Trying 173.194.43.34...
* connected
* Connected to google.com (173.194.43.34) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.25.0 (x86_64-apple-darwin11.3.0) libcurl/7.25.0 OpenSSL/1.0.1b zlib/1.2.6 libidn/1.22
> Host: google.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 04 May 2012 04:03:59 GMT
< Expires: Sun, 03 Jun 2012 04:03:59 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
* Connection #0 to host google.com left intact
* Closing connection #0
However, if you do a curl on the actual www.google.com suggested in the 301 redirect, you'll get the following.
curl -vvvvvv http://www.google.com
* About to connect() to www.google.com port 80 (#0)
* Trying 74.125.228.19...
* connected
* Connected to www.google.com (74.125.228.19) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.25.0 (x86_64-apple-darwin11.3.0) libcurl/7.25.0 OpenSSL/1.0.1b zlib/1.2.6 libidn/1.22
> Host: www.google.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 04 May 2012 04:05:25 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
I've truncated the remainder of google's response just to show the primary difference in the 200 OK vs 301 REDIRECT

curl returns "Failure when receiving data from the peer"

I'm retrieving data from an URL using curl.
Everything works fine if the php code is called via a HTTP request or if the URL is entered in Firefox. If the very same code is executed from a PHP CLI script curl_exec returns false. The error message is "Failure when receiving data from the peer".
Any ideas why curl is not working?
When I set the curl output to verbose I get:
Setting curl to verbose gives:
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Last-Modified: Mon, 01 Aug 2011 13:04:59 GMT
< Cache-Control: no-store
< Cache-Control: no-cache
< Cache-Control: must-revalidate
< Cache-Control: pre-check=0
< Cache-Control: post-check=0
< Cache-Control: max-age=0
< Pragma: no-cache
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Content-Type: text/xml
< Transfer-Encoding: chunked
< Date: Mon, 01 Aug 2011 13:04:58 GMT
<
* Trying 153.46.254.70... * Closing connection #0
* Failure when receiving data from the peer
This is the Code:
// if curl is not installed we trigger an alert, and exit the function
if (!function_exists('curl_init')){
watchdog('sixtk_api', 'curl is not installed, api call cannot be executed',array(),WATCHDOG_ALERT);
return $this;
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Set URL to download
curl_setopt($ch, CURLOPT_URL, $this->request);
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 180);
// Download the given URL, and return output
$output = curl_exec($ch);
if (!$output) {
$error = curl_error($ch);
echo($error);
}
// Close the cURL resource, and free system resources
curl_close($ch);

try wget . if that fails too but you can access the address from another IP/device , this probably means your IP is being blocked or filtered out by either firewall/nginx anti ddos attack . try proxy .

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Receiving url from the curl source - php

Related

ipinfo.io via PHP's file_get_contents() and cURL fail

Authenticating to webservice with Kerberos from PHP in IIS

xml download - blocked

PHP: Validating URL with Curl

curl returns "Failure when receiving data from the peer"

Categories

Resources