999 Error Code on HEAD request to LinkedIn - php

We're using a curl HEAD request in a PHP application to verify the validity of generic links. We check the status code just to make sure that the link the user has entered is valid. Links to all websites have succeeded, except LinkedIn.
While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We've tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?
A sample HEAD request:
curl -I --url https://www.linkedin.com/company/linkedin
Sample Response on Ubuntu machine:
HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html
To respond to #alexandru-guzinschi a little better. We've tried masking the User Agents. To sum up our trials:
Mac machine + Mac UA => works
Mac machine + Windows UA => works
Ubuntu remote machine + (no UA change) => fails
Ubuntu remote machine + Mac UA => fails
Ubuntu remote machine + Windows UA => fails
Ubuntu local virtual machine (on Mac) + (no UA change) => fails
Ubuntu local virtual machine (on Mac) + Windows UA => works
Ubuntu local virtual machine (on Mac) + Mac UA => works
So now I'm thinking they block any curl requests that dont provide an alternate UA and also block hosting providers?
Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?

It looks like they filter requests based on the user-agent:
$ curl -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 999 Request denied
$ curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 200 OK

I found the workaround,
important to set accept-encoding header:
curl --url "https://www.linkedin.com/in/izman" \
--header "user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36" \
--header "accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header "accept-encoding:gzip, deflate, sdch, br" \
| gunzip

Seems like LinkedIn filter both user agent AND ip address. I tried this both at home and from an Digital Ocean node:
curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin
From home I got a 200 OK, from DO I got 999 Denied...
So you need a proxy service like HideMyAss or other (haven't tested it so I couldn't say if it's valid or not). Here is a good comparison of proxy services.
Or you could setup a proxy on your home network, for example use a Raspberry PI to proxy your requests. Here is a guide on that.

Proxy would work, but I think there's another way around it. I see that from AWS and other clouds that it's blocked by IP. I can issue the request from my machine and it works just fine.
I did notice that in the response from the cloud service that it returns some JS that the browser has to execute to take you to a login page. Once there, you can login and access the page. The login page is only for those accessing via a blocked IP.
If you use a headless client that executes JS, or maybe go straight to the subsequent link and provide the credentials of a linkedin user, you may be able to bypass it.

Related

file_get_contents() works differently on different machines

I have written a piece of php code to use file_get_contents() to download a .js file from a site and try to run the code from 2 different machines and they produce different results. The code is:
$link = "https://www.scotchwhiskyauctions.com/scripting/store-scripting_frontend.js";
$options = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n" ),
'ssl'=>array(
'verify_peer'=>false,
'verify_peer_name'=>false),
);
$context = stream_context_create($options);
$line = file_get_contents($link, false, $context);
var_dump($http_response_header);
echo $line;
exit;
When I run this piece of code in a Debian 8.11 machine it produces the following error:
PHP Warning: file_get_contents(https://www.scotchwhiskyauctions.com/scripting/store-scripting_frontend.js): failed to open stream: Connection timed out in /var/www/test.php on line 4
PHP Notice: Undefined variable: http_response_header in /var/www/test.php on line 4
NULL
However when I ran the exact same code on a different machine (Debian 4.16.12-1kali1) it can obtain the file content and the variable $http_response_header contains all the response header. Both machines use php7.2. After spending days trying to figure out what causes the Debian 8.11 machine to not be able to read the file, I used wget on both machines, and noticed that again, the Debian 8.11 (jessie) machine failed to read the file.
I suspected it has something to do with the ssl certificates so I ran
sudo update-ca-certificates
sudo update-ca-certificates --fresh
but it does not help at all.
Can anyone please point me to some direction?
Finally I got the problem fixed by following someone's comment on this post
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
I found the following in the Linux Advanced Routing & Traffic Control HOWTO article.
/proc/sys/net/ipv4/tcp_timestamps
Timestamps are used, amongst other things, to protect against
wrapping sequence numbers. A 1 gigabit link might conceivably
re-encounter a previous sequence number with an out-of-line value,
because it was of a previous generation. The timestamp will let it
recognize this 'ancient packet'.
However I have no idea why it works. Can someone please explain?

PHPUnit, Selenium, and Firefox work, but when I try it with PhantomJS, nothing happens

I have a set of PHPUnit tests that utilize WebDriver and Selenium to run the tests through a browser on my Ubuntu machine. I got everything working as expected for FireFox, and then tried to simply swap the FireFox WebDriver config for some PhantomJS (1.9.8) WebDriver config. When I use a simple site (google.com), PhantomJS will work, but when I try my dev site, nothing appears to happen within PhantomJS and it eventually fails at my first assertion. The screenshot has nothing useful (black and white checkerboard). Whereas if I use google.com, the screenshot is the Google site as expected.
$capabilities = null;
if($this->testConfig["test.browser"] == "phantomjs"){
$capabilities = array(
WebDriverCapabilityType::BROWSER_NAME => 'phantomjs'
,'phantomjs.page.settings.userAgent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86; rv:25.0) Gecko/20100101 Firefox/25.0'
,WebDriverCapabilityType::ACCEPT_SSL_CERTS => 'true'
,WebDriverCapabilityType::PLATFORM => 'LINUX'
,WebDriverCapabilityType::JAVASCRIPT_ENABLED => 'true'
,'--web-security' => 'no'
,'--ignore-ssl-errors' => 'YES'
,'--webdriver-loglevel' => 'DEBUG'
,'phantomjs.page.settings.webSecurityEnabled' => 'false'
);
} else {
$capabilities = array(WebDriverCapabilityType::BROWSER_NAME => 'firefox');
}
$this->webDriver = RemoteWebDriver::create('127.0.0.1:8910', $capabilities, 5000);
$window = new WebDriverDimension(1440,900);
$this->webDriver->manage()->window()->setSize($window);
$webDriver->get($url);
$webDriver->takeScreenshot('/home/scc/screen1.png');
$this->webDriver->wait(45, 1000)->until(
WebDriverExpectedCondition::titleContains($this->testConfig["app.title"])
);
Prior to running the code above, I start the phantomJS server on port 8910
scc#ubuntu:/scc/libraries/phantomjs$ phantomjs --webdriver=127.0.0.1:8910
PhantomJS is launching GhostDriver...
The problem is that I can get my remote site to work (dev.example.com) through FireFox, but I cannot get it to work through PhantomJS. I can hit other sites through PhantomJS (google.com), but not my dev.example.com. I have tried many different settings to get more Debug related log messages from the PhantomJS console, but so far not much of the info has been useful.
[INFO - 2016-01-10T23:31:10.717Z] Session [3b277b40-b7f2-11e5-98cd-41d97e3b4946] - page.settings - {"XSSAuditingEnabled":false,"javascriptCanCloseWindows":true,"javascriptCanOpenWindows":true,"javascriptEnabled":true,"loadImages":true,"localToRemoteUrlAccessEnabled":false,"userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86; rv:25.0) Gecko/20100101 Firefox/25.0","webSecurityEnabled":"false"}
[INFO - 2016-01-10T23:31:10.718Z] Session [3b277b40-b7f2-11e5-98cd-41d97e3b4946] - page.customHeaders: - {}
[INFO - 2016-01-10T23:31:10.718Z] Session [3b277b40-b7f2-11e5-98cd-41d97e3b4946] - Session.negotiatedCapabilities - {"browserName":"phantomjs","version":"1.9.8","driverName":"ghostdriver","driverVersion":"1.1.0","platform":"linux-unknown-32bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"},"phantomjs.page.settings.userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86; rv:25.0) Gecko/20100101 Firefox/25.0","phantomjs.page.settings.webSecurityEnabled":"false"}
[INFO - 2016-01-10T23:31:10.719Z] SessionManagerReqHand - _postNewSessionCommand - New Session Created: 3b277b40-b7f2-11e5-98cd-41d97e3b4946
I thought maybe it could be an SSL issue to my dev.example.com, but so far it does the exact same thing with http vs. https. What else could I be doing wrong? What can I do to help make PhantomJS log more debug info so I can tell where it is failing?
Thanks,
Sean

wget fails on a local domain

I have a Red Hat linux box with apache running several domains, including a.com and b.com.
I have a php script a.com/wget.php, which makes an exec() call to download a file on the local domain b.com. Running the php script from the command line is successful.
But running this script from a web page results in a 404 error. The command is:
/usr/bin/wget -k -S --save-headers --keep-session-cookies
-O <local-file-name> -o <local-log-file-name> -U \"Mozilla/5.0
(Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101
Firefox/24.0\" --max-redirect=100 "http://b.com/page.php"
No log messages are written to the Apache access log file for domain b.com for this call.
BUT the server access log file (/var/log/httpd/access_log) is NOT empty, it shows that there was an attempt made to open page "/page.php" on the server (the link in access log has no domain).
xx.xx.xx.xx - - [19/May/2014:12:02:49 +0100] "GET /page.php
HTTP/1.0" 404 285 "-" "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0"
Server error log (/var/log/httpd/error_log) gives this error:
[Mon May 19 12:02:49 2014] [error] [client xx.xx.xx.xx]
File does not exist: /var/www/vhosts/default/htdocs
So it would seem that something is stripping the domain name from "http://b.com/page.php" and the resulting URL that wget is trying to connect to is "/page.php". This will not work, given that the server has many domains on it.
Has anyone come across this? Is there some setting in wget or php or apache that would cause this to not happen? I tried different things based on suggestions regarding similar problems, but nothing has worked so far.
Thanks.
The problem turned out to be not in wget, but in firewall settings. The wget call, executed from behind the firewall, was resolving the domain to an external IP address, and connections to the external IP address were failing. Correcting this in the firewall fixed the wget problem.

jQuery - GET News.html 404 (Not Found)

For some reason, this doesn't work:
$.ajax({
url: "News.html",
cache: false,
}).done(function(data) {
$("#content").load(data);
});
It gives me:
GET http://127.0.0.1/News.html 404 (Not Found)
But for whatever reason, opening that url manually (copy paste the url) works just fine.
And i thought it had something to do with browser cache at first so i added the cache: false option to the ajax function but even then.. argh..
Also it does not show up as a requested URL in my access.log file..
For information i guess, i'm running:
lighttpd
php as fast-cgi via localhost:port
mapped .html => .php
Running OpenBSD 5.3
and uncommented (in /etc/php.ini):
cgi.fix_pathinfo=1
Also:
# ls *.html
News.html index.html
And here's the request headers for News.html:
Request URL:http://127.0.0.1/News.html
Request Method:GET
Status Code:404 Not Found
Request Headers
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Host:127.0.0.1
Referer:http://127.0.0.1/index.php
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36
X-Requested-With:XMLHttpRequest
Response Headers
Content-type:text/html
Date:Tue, 16 Jul 2013 21:55:05 GMT
Server:lighttpd/1.4.32
Transfer-Encoding:chunked
X-Powered-By:PHP/5.3.21
Checkpoint
Conclusion from the comments so far is that this might not be a jQuery issue at all.
Considering that the server responds with all the data (i've checked raw data sent) and it contains everything, but the response header says 404.
Meaning, the data is found but the header says 404... it's odd to say the least..
curl test
curl 'http://127.0.0.1/News.html' -H 'Accept-Encoding: gzip,deflate,sdch' -H 'Host: 127.0.0.1' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36' -H 'Accept: */*' -H 'Referer: http://127.0.0.1/' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' --compressed
Here you'll soon find a facebook feed, among other things :)
Zerkms test
# echo "wham bam" > zerkms_doesnt_believe.html
#
Config files
lighttpd.conf
php-5.3.ini
Error logs and what not
lighttpd-error.log
cURL test
Manual FastCGI test via a Python client:
# python fcgi_app.py
{'FCGI_MAX_CONNS': '1', 'FCGI_MPXS_CONNS': '0', 'FCGI_MAX_REQS': '1'}
After some tinkering, i figured out how the FastCGI protocol works and i found a client that matched my needs, funny enough it matched the name of my script so here's the output:
# python fcgi_app.py
('404 Not Found', [('x-powered-by', 'PHP/5.3.21'), ('content-type', 'text/html')], '<html>\n\t<head>\n\t\t<title>test php</title>\n\t</head>\n<body>\nChecking</body>\n</html>', '')
And Here's the source
Giving me the conclusion that this is in fact a PHP issue (even tho i've hated on lighttpd for not honoring the 200 code php should respond with.. And for that i'm sorry. Should go bash a little on PHP and see if that helps me come to a conclusion)
Temporary Solution
Placing the following in the top part of your .php page will work around this issue.
Note that it's a clean workaround, it will work but it's not a long term fix for sure.
<?php
header("HTTP/1.0 200 Found");
?>
This smells a bit like a same-origin policy issue.
The path you are specifying may be causing the issue.
Try
$.ajax({
url: "/News.html",
cache: false,
}).done(function(data) {
$("#content").load(data);
});
And let me (us) know if that helps.
This one had me stymied for a bit. Feeling some compulsive urges, I installed lighttpd and php5 on an fresh Ubuntu 12.10 VM (didn't have a BSD one handy). I had to modify to poll from kqueue, but other than that I used your lighttpd.conf. And everything worked fine.
So then I installed your php.ini file, and BAM http status 404 while returning proper content. So that narrowed it down to php-cgi.
Turns out that when the service started, it would log
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/local/lib/php-5.3/modules/pdo.so' - /usr/local/lib/php-5.3/modules/pdo.so: cannot open shared object file: No such file or directory in Unknown on line 0
So id did a quick search and changed one line in the php.ini from
extension_dir = "/usr/local/lib/php-5.3/modules"
to
extension_dir = "/usr/lib/php5/20100525"
restarted php-cgi, and voila status 200 to go along with the content.
After setting up a fresh OpenBSD 5.3 server, and installing with your config files, I was able to narrow down the root cause.
In the lighttpd.conf you have server.chroot = "/var/www/" so all of its path names exclude the /var/www from the front. The php-fastcgi process is not chrooted, so it has a slightly different view of the file system.
Solution #1:
Don't chroot lighttpd and change the server.document-root, accesslog.filename, and server.errorlog to absolute paths.
Solution #2:
Use php-fpm or similar to make PHP chroot aware/capable
Use simple jQuery .load() method:
$(document).ready(function () {
$("#content").load('News.html');
});

wkhtmltopdf integrated with php doesn't work on Centos (access deny)

I installed wkhtmltopdf on my Centos server.
Everything works fine in the shell. If I try to send the command in the shell:
/usr/local/bin/wkhtmltopdf http://www.google.it /var/www/html/test_report.pdf
or simply
wkhtmltopdf ... /var/www/html/test_report.pdf
everything goes well, but the same is not working if i use the exec command in a php script:
exec("/usr/local/bin/wkhtmltopdf http://www.google.it /var/www/html/test_report.pdf");
I changed the chmod of the html folder in 0777, but in the access.log I have the following response:
[08/Oct/2012:17:11:18 +0200] "GET test_report.php HTTP/1.1"
200 311 "-" "Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101
Firefox/15.0.1"
The same script works fine on a windows 2003 server.
Is there a way to get around this error?
Thank you.
Most likely SELinux is blocking it, I had the same issue once.
Don't disable SELinux (that's just a bad idea/the lazy man's way to "fix" it), but use the audit2allow tool instead to figure out what context/SELinux booleans need to be altered.
See http://wiki.centos.org/HowTos/SELinux#head-faa96b3fdd922004cdb988c1989e56191c257c01 for more details.
In my case the problem was SELinux (as #Oldskool mentioned his answer). In execoutput there was only information PROT_EXEC|PROT_WRITE failed.
To resolve the problem I ran:
setsebool httpd_execmem on
I found this solution at groups.google.com

Categories