Force HTTP while fetching page source with PHP

Force HTTP while fetching page source with PHP - php

How would I force HTTP (Not HTTPS), while getting the source code of: http://www.youtube.com/watch?v=2YqEDdzf-nY?
I've tried using get_file_contents, but it goes to HTTPS.

There is no way, because google forces you to use https. It will not accept longer unsecure connection.
They even start to downrank websites, which are not on SSL.
As for your Comment, i have done a little bit more research.
Maybe it is depended on the user-agent. I have no time to confirm this.
Try CURL with this User Agent:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101

Related

404 Bot Attack on My Website (DDoS of Sorts)

Over the last few days I have noticed that my Wordpress website had been running quite slowly, so I decided to investigate. After checking my database I saw that a table which was responsible for tracking 404 errors was over 1GB is size. At this point it was evident I was being targeted by bots.
After checking my access log I could see that there was a pattern of sorts, the bot seemed to land on a legitimate page which listed my categories and then move into a category page and at this point they request seemingly random page numbers, many of which are non-existent pages causing the issue.
Example:
/watch-online/ - Landing Page
/category/evolution/page/7 - 404
/category/evolution/page/1
/category/evolution/page/3
/category/evolution/page/5 - 404
/category/evolution/page/8 - 404
/category/evolution/page/4 - 404
/category/evolution/page/2
/category/evolution/page/6 - 404
/category/evolution/page/9 - 404
/category/evolution/page/10 - 404
This is the actual order of requests and they all happen within a second, at this point the IP becomes blocked as too many 404's have been thrown but this seems to have no affect due to the sheer number of bots all doing the same thing.
Also the category changes with each bot so they are all attacking random categories and generating 404 pages.
At the moment there are 2037 unique ip's which have thrown similar 404s in the last 24 hours.
I also use Cloudflare and have manually blocked many ip's from ever reaching my box but this attack is relentless and it seems as though they keep generating new ip's. Here is a list of some offending ip's:
77.101.138.202
81.149.196.188
109.255.127.90
75.19.16.214
47.187.231.144
70.190.53.222
62.251.17.234
184.155.42.206
74.138.227.150
98.184.129.57
151.224.41.144
94.29.229.186
64.231.243.218
109.160.110.135
222.127.118.145
92.22.14.143
92.14.176.174
50.48.216.145
58.179.196.182
Other than automatically blocking ip's for too many 404 errors I can think of no other real solution and this in itself is quite ineffective due to the sheer number of ip's.
Any suggestions on how to deal with this would be greatly appreciated as there appears to be no end to this attack and my websites performance really is taking a hit.
Some User Agents Include:
Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36
Mozilla/5.0 (Windows NT 6.2; rv:26.0) Gecko/20100101 Firefox/26.0
Mozilla/5.0 (compatible; MSIE
10.0; Windows NT 7.0; WOW64; Trident/6.0)
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:22.0) Gecko/20100101
Firefox/22.0 Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36

If its your personal website, you can try checking cloudflare, which is free and also it can provide support against any ddos attacks.May be you can give a try.

Okay so after much searching, experimentation and head banging I have finally mitigated the attack.
The solution was to install the apache module 'mod_evasive' see:
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7
So for any other poor soul that gets slammed as severally as I did have a look at that and get your thresholds finely tuned. This is a simple, cheap and very effective means of drastically downplaying any attack similar to the one I suffered.
My server is still getting bombarded by bots but this really does limit their damage.

PHP opening link from Excel runs page three times

I'm having a strange issue, which I find difficult to summarize in a title.
First:
I have a webpage, where people need to be logged in.
I have a Excel document, with links to the webpage.
The problem:
When people are logged in, and they click on the link in the Excel document. The webpage tells them that they are not logged in.
What I found so far:
I'm using Office on Mac and I don't have any issues.
People using Office on Windows do have issues.
I think the issue is due to SESSIONS, that might be the reason why users aren't logged in while they should be.
I did some tests.
Every URL goes through index.php
index.php
<?php
session_start();
file_put_contents('log.txt', microtime().': SERVER '.print_r($_SERVER, true).PHP_EOL, FILE_APPEND);
exit;
Now when I click the link from Office on Mac (NO ISSUES!!!), I get a dump of the variable $_SERVER. Two important variables:
[HTTP_USER_AGENT] => Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
[HTTP_COOKIE] => PHPSESSID=77lpqmdmvskv33d2ddsdlfs5q7; rememberme=1%3Ae79e92271e7e05a5ee5679b659b3cb5cbb61e60d96c158f4648960136b175164%3Accdee80c3e42705fcd7e8c234525beda86d27394653dfdfb42bdd3ec98592ca1
You can see the browser (Chrome) and the cookie, which contains a rememberme cookie for login.
Now, when I do the same by clicking on a link in Excel on Windows, I get the $_SERVER variable printed three times in the log file!
First:
[HTTP_USER_AGENT] => Microsoft Office Excel 2014
[HTTP_COOKIE] => PHPSESSID=0ivlfjf49j4b82858tstc2lmm3; PHPSESSID=tv6gs33j721d0tmm3rrjdoho45
Notice the user agent and no rememberme cookie.
Second:
[HTTP_USER_AGENT] => Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; ms-office)
[HTTP_COOKIE] => PHPSESSID=0ivlfjf49j4b82858tstc2lmm3
Notice, still no chrome browser and rememberme cookie.
Third:
[HTTP_USER_AGENT] => Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
[HTTP_COOKIE] => PHPSESSID=3s0hvtssghk7uomvkpb5k70tc2; rememberme=1%3Aa9bd74ad58a0d7075c27108be1adbd26ba6d18f6e8b39073152d6780131ffe70%3A643852f8636c76c0bfc4017ec7fe3eab98dd57f5bcfdf86f0e37b5ec28a0c0ef
Finally user agent is Chrome and rememberme cookie is set.
So, it's getting a long story. But clicking on the link in Excel from Windows, it does strange things. Anyone an idea what is happening?

Oke, I found the problem. Below an answer from superuser.com
The URL you're using needs some more information from a cookie to
display the search results rather than the search page. Paste the URL
into a different browser (or remove your cookies) and you'll get the
same results.
Clicking a URL in Excel seems to open it in your default browser. But
that's not really true. Before opening it in your browser, Excel first
runs Microsoft Office Protocol Discovery. This uses a Windows/Internet
Explorer component to determine if the URL works. (It does not
identify itself as Internet Explorer, but as "User Agent: Microsoft
Office Existence Discovery".) And if the results are (somehow) okay
then it will open the result of that check in your default browser.
Lacking the cookies (more precisely: lacking a session), GoDaddy gives
that Internet Explorer component some redirect. And the result of that
is opened in your default browser. That's the URL you're seeing.
Most likely your default browser is not Internet Explorer? Then
pasting the URL into IE directly and clicking it, to get the cookies,
might then also make the link work from Excel. (Just for testing; it's
not a permanent solution.)
You will have more luck using a URL that does not rely on some hidden
information from a cookie, like
http://www.godaddy.com/domains/search.aspx?domainToCheck=superuser.com
Source: https://superuser.com/a/445431
So to solve this issue:
When Excel checked the link, it gets redirected to '/login' because it wasn't logged in. And finally that URL is the URL Excel opens in the real browser.
So I changed the login script and a user will not be redirected to '/login', but stay on the same URL and it will be shown the login form if not logged in. Excel now opens the original URL an if the user is logged in, it will see the page. If it is not logged in, the login form will be shown.

Changing the user-agent

Well, I have the following problem. I made a tool that checks the status of a website.
For example if I enter www.youtube.com it will say
http://www.youtube.com HTTP/1.0 200 OK
and for a website with a redirect it will say:
http://www.imgur.com HTTP/1.1 302 Moved Temporarily
http://imgur.com/ HTTP/1.1 200 OK
Alright, this works just as it should, however I would like to make it so that you can select the user-agent. So for example Android or something. Because youtube on android will redirect to m.youtube.com
I made a dropdownlist already with different user-agents and now what I don't know is how to change a user-agent. When I search google it just gives me browser plugins or addons.
I hope someone knows of a way to do this.
Thanks in advance!

You can send a CURL request and change the user agent like this.
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

OS detection if useragent value is altered

Using Php is it possible to detect exact OS even if the browser agent value is altered?
Here is the case
If someone override Firefox useragent value using "general.useragent.override"
Actual: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0
Override: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.55.3 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10
$_SERVER['HTTP_USER_AGENT'] value will be totally fake. it's not useful even to detect correct Operating System.
Is there any Php solution in this situation?
Thanks.

No, it is not possible. The only information you have is that supplied by the User-agent header, and if a user wants to send false information there is nothing you can do to detect it.

You can still use JavaScript to find the screen size but not the os this is how to
<script type="text/javascript">
document.write(screen.width+'x'+screen.height);
</script>
but this can be changed by the client anyway as its on the client side on ios there is one way by setting up a mobile management profile temp to very the device but thats a lot of work for the client so only do that if you have to
But in most cases you cannot very its not mod

#file_get_contents() and curl failed to get page contents, I need alternate code

some sites are blocking #file_get_contents and the curl code also. I need code(PHP) that circumvents that problem. I only need to get the page contents so I can extract the title.

You probably need to set the user agent string to emulate a "real" browser:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20110319 Firefox/4.0');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.