Detect bad link referrer neighborhood using CURL - php

I'm trying to use CURL to assess the visitors on my site. I'd like to see if they are being linked from a bad neighborhood or not. Most of the time my current code works, but not always.
I'm having a bit of trouble making my CURL able to fool all servers. How do I make my CURL headers totally convincing, and remove any possible clues that I'm using CURL?
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.example.com");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$vars); //Post Fields
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$headers = array();
$headers[] = 'X-Apple-Tz: 0';
$headers[] = 'X-Apple-Store-Front: 143444,12';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$headers[] = 'Accept-Encoding: gzip, deflate';
$headers[] = 'Accept-Language: en-US,en;q=0.5';
$headers[] = 'Cache-Control: no-cache';
$headers[] = 'Content-Type: application/x-www-form-urlencoded; charset=utf-8';
$headers[] = 'Host: www.example.com';
$headers[] = 'Referer: http://www.example.com/index.php'; //Your referrer address
$headers[] = 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0';
$headers[] = 'X-MicrosoftAjax: Delta=true';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$server_output = curl_exec ($ch);
print $server_output;
curl_close ($ch);
if (strpos($output,'sex') !== false)
{
echo 'sex';
}
?>
For example, a certain well known adult video site with an orange logo that looks a lot like the YouTube logo (maybe you guys know the one) responded with this:
403 Forbidden
Request forbidden by administrative rules.
__SERVERNAME__

In chrome dev tools, you can obtain the full HTTP request chrome used against a URL by:
opening dev tools
goto the "network" tab
request the URL you want - if you are already on the target page - hit F5 or reload.
Dev tools will then generate a list of HTTP requests (and responses) made
right click on the HTTP request / URL you are interested in
click the "save as curl" option and you will now have the full HTTP request details (for command line curl ) in your clipboard.
By using these values when sending an HTTP request your request will ostensibly appear to be made by a Chrome web browser.

Related

Difference between command line cURL and PHP cURL

I have a cURL command like this:
curl 'https://www.example.com' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' \
-H 'accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7' \
-H 'authority: www.example.com'
Executing this in a command line like in Terminal app on my Mac, results to the expected output.
(In case you test it yourself: If this output contains the word Sicherheitsüberprüfung it's geo blocked and you have to use a German IP to test it.)
I transferred the exact command to PHP cURL like this:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.example.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
$headers[] = 'Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
$headers[] = 'Authority: www.example.com';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
?>
When I run this code I'm getting a message that my request was recognized as automated request/robot: It says Sicherheitsüberprüfung, means security check.
Of course, I'm using the same IP for both, command line and PHP cURL request.
Why that? Isn't command line cURL the same as PHP cURL?
Or is there anything wrong with my PHP script?
UPDATE
I fortuitously found out the following: I'm using Coda as code editor on my Mac. This has a build-in PHP rendering engine. Using this with my PHP script, the result is as expected. It's the same result I'm getting in the command line.
UPDATE 2
I made what Jannes Botis suggested in his answer. I then ran the PHP script in my Coda code editor app (what output the expected) and with MAMP as localhost (what is always recognized as automated request).
I figured out that the the code executed with MAMP was using HTTP/2 while the code executed in Coda is using HTTP/1.1. To solve this, I added the following to the script:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
Now, both output exact the same string:
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Authority: www.example.com
But, it's still the same: The one is working, the other is recognized as automated request.
Try to debug the request in both cases:
a) Terminal: use curl verbose mode: curl -v and check the http request sent, especially check the header list
b) php curl: print the http request using CURLINFO_HEADER_OUT:
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_exec($ch);
$info = curl_getinfo($ch);
print_r($info['request_header']);
Testing the different headers, what made it work was adding "Pragma: no-cache" header to the request:
$headers[] = 'Pragma: no-cache';
On the other hand, in terminal curl, I had to uppercase the request headers, e.g. User-Agent etc.
Try to create a tcp connection with fsockopen:
$fp = fsockopen("ssl://"."www.example.com", 443, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
$headers[] = 'Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
$headers[] = 'Authority: www.example.com';
$out .= $headers;
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 1024);
}
fclose($fp);
and test if this works. Maybe the issue is either that php curl adds some info to the http request or the problem is on the tcp connection level, some info added there.
References
cURL works from Terminal, but not from PHP
PHP cURL: modify/overwrite Connection header
Sending TCP Data with PHP
Command line curl :
It is a tool to transfer data to or from a server, using any of the supported protocols (HTTP, FTP, IMAP, POP3, SCP, SFTP, SMTP, TFTP, TELNET, LDAP or FILE). curl is powered by Libcurl. This tool is preferred for automation, since it is designed to work without user interaction. curl can transfer multiple file at once.
For more details for Command line curl
Syntax:
curl [options] [URL...]
Example:
curl http://site.{one, two, three}.com
PHP cURL
$ch = curl_init('http://example.com/wp-login.php');
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
if($this->getRequestType() == 'POST')
{
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,
array(
'user[name]' => 'Generic+Username',
'user[email]' => 'mahekpatel04#gmail.com'
);
);
}
$response = curl_exec($ch);
The issue is with ciphers selected by PHP's cURL by default.
Running curl command with -Ivs options allows us to see what ciphers it uses:
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:#STRENGTH
Setting them in PHP allows it to bypass this mysterious check:
curl_setopt($ch,
CURLOPT_SSL_CIPHER_LIST,
'ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:#STRENGTH'
);
Also, it seems that Host header and using HTTPv2 should be added:
$headers[] = 'Host: www.11880.com';
// ...
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);

file_get_content Return something only on certain account but not on some other

Hi So I am setting up a system where you enter an isntagram username and then the website get the informations about this account (username, profilepic, followers and following...)
So i am doing this with simple php code with file_get_content and fetching to get the id of the suer and then go to the info page with this url and the preset instagram info link.
$username = $_POST['username'];
$html =file_get_contents('https://instagram.com/'.$username);
$subData=substr($html, strpos($html, 'window._sharedData'), strpos($html,
'};'));
$userid=strstr($subData, '"id":"');
$userid=str_replace('"id":"', '', $userid);
$userid=strstr($userid, '"', true);
$userData =
file_get_contents('https://i.instagram.com/api/v1/users/'.$userid.'/
info/');
$userDecodedData=json_decode($userData);
session_start();
$username = $userDecodedData->user->username;
$profilepicurl = $userDecodedData->user->hd_profile_pic_url_info->url;
$followers = $userDecodedData->user->follower_count;
$following = $userDecodedData->user->following_count;
$bio = $userDecodedData->user->biography;
$_SESSION['scoreinsta'] = $followers - $following;
So this works just fine when type my instagram username or my friend's but not when I try with kylie jenner's username or Instagram's or Ariana Grande, i've tried with cristiano ronaldo account to see if instagram was blocking all the most famous people but it works with his account :/ I'm kinda lost...
file_get_contents(https://i.instagram.com/api/v1/users/12281817/info/): failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in C:\wamp64\www\Fame\addinsta.php on line 11
(error message I get when I trie with Kylie Jenner).
This is the error message but what I dont understand is that you can try the url it gives and it works jsut fine (you can see the info in an array or whathever) but the error message says he cant access it.
Edit: I'm currently trying with every most followed accounts I can and it doesnt work with taylor swift also.
You lack cookies to be able to retrieve data from the url:
you can try (ex: for case me):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://i.instagram.com/api/v1/users/12281817/info/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$headers = array();
$headers[] = 'Authority: i.instagram.com';
$headers[] = 'Pragma: no-cache';
$headers[] = 'Cache-Control: no-cache';
$headers[] = 'Upgrade-Insecure-Requests: 1';
$headers[] = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
// $headers[] = 'Accept-Encoding: gzip, deflate, br';
$headers[] = 'Accept-Language: vi-VN,vi;q=0.9,fr-FR;q=0.8,fr;q=0.7,en-US;q=0.6,en;q=0.5';
$headers[] = 'Cookie: mid=XS3ghgALAAGXu10Eb58jOsW7SAEi; fbm_124024574287414=base_domain=.instagram.com; csrftoken=tNVs4niJr2fLiLh76dPPGJuFaMlihIEd; ds_user_id=3043596499; sessionid=3043596499%3ArcZihGFrIEdqkX%3A6; shbid=14335; shbts=1565197428.8656168; rur=FTW; urlgen=^^^{\"113.177.118.128\":';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
var_dump($result);

Why is my POST with cURL not returning JSON correctly?

In PHP, I'm trying to retrieve the url for a specific page in DocuSign that constantly refreshes. The POST to retrieve this url is in the form:
POST http://demo.docusign.net/restapi/{apiVersion}/accounts/{accountId}/envelopes/{envelopeId}/views/recipient
This should return a json file in the form:
{
"url": "example.example.com"
}
However, I am extremely new to using PHP and POST methods and don't believe I'm doing this correctly. The API explorer for this method in particular is here. I am using cURL methods to make this request. Here is my code ($recipient,$account_id,$access_token are found accurately within another file):
$url = "http://demo.docusign.net/restapi/v2/accounts/$account_id
/envelopes/$envelope_id/views/recipient";
$body = array("returnUrl" => "http://www.docusign.com/devcenter",
"authenticationMethod" => "None",
"email" => "$recipient",
"userName" => "$recipient");
$body_string = json_encode($body);
$header = array(
'Accept: application/json',
'Content-Type: application/json',
'Content-Length: '.strlen($body_string),
);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $body_string);
$json_response = curl_exec($curl);
$response = json_decode($json_response, true);
var_dump($response);
I am able to get the correct return on the API explorer, but not when making the request with PHP. I believe this is due to the fact that I am not incorporating the $header or $body correctly, but at this point I am just not sure.
ADDED: This is the raw output for the request when correctly running the method on the API Explorer:
Accept: application/json
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,fa;q=0.6,sv;q=0.4
Cache-Control: no-cache
Origin: https://apiexplorer.docusign.com
Referer: https://apiexplorer.docusign.com/
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Authorization: Bearer fGehcK7fkRvFguyu/7NGh01UUFs=
Content-Length:
Content-Type: application/json
This is the JSON request being formed in my code:
{
"returnUrl":"http:\/\/www.docusign.com\/devcenter",
"authenticationMethod":"Password",
"email":"example#example.com",
"userName":"example#example.com",
"clientUserId":"4c6228f4-fcfe-47f9-bee1-c9d5e6ab6a41",
"userId":"example#example.com"
}
You are not hitting a valid DocuSign URL in your cURL code. Right now you are sending requests to:
http://demo.docusign.net/apiVersion/v2/accounts/{accountId}/envelopes/{envelopeId}/views/recipient
Instead of "apiVersion" it should be "restApi" like this:
http://demo.docusign.net/restapi/v2/accounts/{accountId}/envelopes/{envelopeId}/views/recipient
We can't send post fields, because we want to send JSON, not pretend to be a form (the merits of an API which accepts POST requests with data in form-format is an interesting debate). Instead, we create the correct JSON data, set that as the body of the POST request, and also set the headers correctly so that the server that receives this request will understand what we sent:
$data = array("name" => "Hagrid", "age" => "36");
$data_string = json_encode($data);
$ch = curl_init('http://api.local/rest/users');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Content-Length: ' . strlen($data_string))
);
$result = curl_exec($ch);
All these settings are pretty well explained on the curl_setopt() page, but basically the idea is to set the request to be a POST request, set the json-encoded data to be the body, and then set the correct headers to describe that post body. The CURLOPT_RETURNTRANSFER is purely so that the response from the remote server gets placed in $result rather than echoed. If you're sending JSON data with PHP, I hope this might help!
I know this question was asked more than 3 years ago, but this may help someone who finds this question because they are having the same problem. I do not see a cURL option that will decode the response in your code. I have found that I need to use the cURL option CURLOPT_ENCODING like this: curl_setopt($ch,CURLOPT_ENCODING,""); According to the PHP manual online, it says, 'CURLOPT_ENCODING - The contents of the "Accept-Encoding: " header. This enables decoding of the response. Supported encodings are "identity", "deflate", and "gzip". If an empty string, "", is set, a header containing all supported encoding types is sent.' You can find this option at https://www.php.net/manual/en/function.curl-setopt.php. I hope this helps save someone from having a headache.

Can I fill a form on an external site from my own site?

I am not sure whether this is possible or not, but if it is, I need to know where to start at least. I want to fill the form on the site I do not own, using a field on the site I do own.
Specifically this computer game site http://www.g2a.com/, and the search form there which is
<input type="text" class="mp-h-main ui-autocomplete-input" id="product-autocomplete" placeholder="Search a game" data-rel="active" autocomplete="off" state="closed">
It seems to be Ajax or jQuery loaded, and there seems to be no regular search function on the site. Is there a good known way, or do I stand little chance in this case?
Kind regards, John
Sniff the form post content using something like live http headers for firefox and emulate it with php curl, i.e.:
<?php
$productName = rawurlencode("DOOM STEAM CD-KEY PREORDER GLOBAL");
$url = "https://www.g2a.com/lucene/search/quick?jsoncallback=jQuery111005943281338131983_1462669099509&phrase=$productName&isWholesale=false&cn=&skip=28837%2C28838%2C28847%2C28849%2C28852%2C28856%2C28857%2C28858%2C28859%2C28860%2C28861%2C28862%2C28863%2C28867%2C28868%2C29472%2C29473%2C29474%2C33104&start=0&rows=5&_=1462669099513";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
"Host: www.g2a.com",
"Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",
"X-Requested-With: XMLHttpRequest",
"Referer: https://www.g2a.com/",
"Cookie: store=englishus; user_time_offset=-120; _ga=GA1.2.1616293500.1462668986; PHPSESSID=eemnk12t6fml8l0l0mf3l63tr5; currency=USD; _gat=1; __ar_v4=WZC2HGDHXZBR7NN565K5H7%3A20160507%3A3%7CY5G5B7MZYJA65OM2BVC43V%3A20160507%3A3%7CJOM3QZF4VBESRIYVTRCJ3R%3A20160507%3A3; Hm_lvt_11391e2f2164ca5838ee836fac473f57=1462668991,1462669101; Hm_lpvt_11391e2f2164ca5838ee836fac473f57=1462669101; external_no_cache=1",
"Connection: keep-alive"
));
$response = curl_exec($ch);
echo $response;
curl_close ($ch);
Output:
jQuery111005943281338131983_1462669099509({"numFound":1,"start":0,"docs":[{"id":27581,"name":"DOOM STEAM CD-KEY PREORDER GLOBAL","type":"egoods","preOrder":1,"slug":"/doom-steam-cd-key-preorder-global.html","addUrl":"uenc/aHR0cDovLw,,/product/27581/","minPrice":32.99,"g2aQty":1,"g2aPrice":32.99,"retailQty":0,"wholesaleQty":0,"thumbnail":"https://images.g2a.com/m/58x58/1x1x1/thumbnail/d/o/ef1f8c916783_doom_2d_3.png","brandsDirectOnSearch":0,"bdPrice":0}]})

How to avoid "HTTP/1.1 999 Request denied" response from LinkedIn?

I'm making request to LinkedIn page and receiving "HTTP/1.1 999 Request denied" response.
I use AWS/EC-2 and get this response.
On localhost everything works fine.
This is sample of my code to get html-code of the page.
<?php
error_reporting(E_ALL);
$url= 'https://www.linkedin.com/pulse/5-essential-strategies-digital-michelle';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
var_dump($response);
var_dump($info);
I don't need whole page content, just meta-tags (title, og-tags).
Note that the error 999 don't exist in W3C Hypertext Transfer Protocol - HTTP/1.1, probably this error is customized (sounds like a joke)
LinkedIn don't allow direct access, the probable reason of them blocking any "url" from others webservers access should be to:
Prevent unauthorized copying of information
Prevent invasions
Prevent abuse of requests.
Force use API
Some IP addresses of servers are blocked, as the "IP" from "domestic ISP" are not blocked and that when you access the LinkedIn with web-browser you use the IP of your internet provider.
The only way to access the data is to use their APIs. See:
Accessing LinkedIn public pages using Python
Heroku requests return 999
Note: The search engines like Google and Bing probably have their IPs in a "whitelist".
<?php
header("Content-Type: text/plain");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.linkedin.com/company/technistone-a-s-");
$header = array();
$header[] = "Host: www.linkedin.com";
$header[] = "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0";
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: en-US,en;q=0.5";
$header[] = "Accept-Encoding: gzip, deflate, br";
$header[] = "Connection: keep-alive";
$header[] = "Upgrade-Insecure-Requests: 1";
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_HTTPHEADER , $header);
$my_var = curl_exec($ch);
echo $my_var;
LinkedIn is not supporting the default encoding 'identity' , so if you set the header
'Accept-Encoding': 'gzip, deflate'
you should get the response , but you would have to decompress it.
I ran into this while doing local web development and using the LinkedIn badge feature (profile.js). I was only getting the 999 Request denied in Chrome, so I just cleared my browser cache and localStorage and it started to work again.
UPDATE - Clearing cache was just a coincidence and the issue came back. LinkedIn is having issues with their badge functionality.
I submitted a help thread to their forums.
https://www.linkedin.com/help/linkedin/forum/question/714971

Categories