file_get_contents gets different file from google than shown in browser - php

I use file_get_contents to find out if there is an URL of the search I look at:
http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate
If I go to this URL in my browser, a different file is displayed then what I see when I echo file_get_contents
$url = "http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate";
$google_search = file_get_contents($url);
What's wrong with my code?

Nothing really. The problem is that the page uses javascript and ajax to get contents. So, in order to get a "snapshot" of the page, you need to "run it". That is, you need to parse the javascript code, which php doesn't do.
Your best bet is to use an headless browser such as phantomjs. If you search, you find some tutorials explaining how to do it
NOTE
If all you're looking for is a way to retrieve raw data from the search, you might want to try to use google's search api.

I assume Google is definitely checking the user agent to avoid any kind of automated searches.
So you should at least use CURL and define a proper user agent string (i.e. the same as a common browser) to "trick" Google.
Somehow I fear it will not be so easy to trick Google, but maybe I'm just paranoic and at least you may learn something about CURL.

Related

Taking information from one site and using it on my analyser/script made with PHP

I need to extract the "Toner Cartridges" levels from This site and "send it" to the one im working on. Im guessing i can use GET or something similar but im new to this so i dont know how it could be done.
Then the information needs to be run through a if/else sequence which checks for 4 possible states. 100% -> 50%, 50%->25%, 25%->5%, 5%->0%.
I have the if/else written down but i cant seem to find any good command for extracting the infromation from the index.php file.
EDIT: just need someone to poin me in the right the direction
To read the page you can use file_get_contents
$page = file_get_contents("http://example.com");
But in order to make the function work with URLs, allow_url_fopen must be set to true in the config file of your php (php.ini)
Then you can use a regular expression to filter the text and get data.
The php function to perform a regular expression is preg_match
Example:
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
will output
domain name is: php.net
I imagine you are reading from a Printer Status Page. In which case, to give your self the flexibility to use sessions and login, I would look into Curl. Nice thing about Curl is, you can use the PHP library for code, but you can also test at the command-line rather quickly.
After you are retrieving the HTML contents, I would look into using an XML parser, like SimpleXML or DOMDocument. Either one will get you to the information you need. SimpleXML is a little easier to use for people new to traversing XML (this is, at the same time, like and very not like jQuery).
Although, that said, you could also hack to the data just as quick (if you are just now jumping in) with Regulair Expressions (it is seriously like that once you get the hang of it).
Best of luck!

How to interact with page elements while crawling a website with PHP?

I need to go to http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx, enter data in the fields, then click the button to get results. When taken to the result page, I'm given a table of data but it's paginated into 5 different pages.
I'm able to do the above using cURL, but it's at this point that I get stuck.
Once I'm on the result page, I need to click the "date" header twice to make the data order by decreasing date, then skim off the current day's results.
Any idea how to do this, advanced detail or in concept? Either way should help.
Thanks!
The problem is that the click is actually performing a postback using javascript, with the limitations of PHP and cURL you will need to inspect the HTTP headers (GET, POST and COOKIES) being sent by the browser, and emulate them. Taking in mind that some values might be session dependent. Right now I don't have time to do this for you but I know it can be quite tricky with ASP.Net websites in some cases. There might be easier ways to do it, but that's what it will always come down to, because that's what happens.
If you weren't tied to PHP a whole world of options open - for example, the aggregator in the project I'm working on is actually capable of executing (controlled) javascript specifically for these kinds of tasks/pages (albeit on a grander scale).
I can't get a working set of results - if you could post some dummy data that gives results, that would help.
As a generic answer, you need something that can manipulate the DOM. You can go server-side with something like PHP and Webdriver, or purely client-side with Selenium. Simulate the click, get the resulting HTML and parse that.
This should work. try this.
$url ='http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx';
## do curl , with cookies enabled.
## after do this.
$url =$url.'?'.'__EVENTTARGET=Search%3AdgSearch%3A_ctl2%3A_ctl1&__EVENTARGUMENT=&__VIEWSTATE=dDwtMjk2Mjk5NzczO3Q8O2w8aTwxPjs%2BO2w8dDw7bDxpPDE%2BOz47bDx0PDtsPGk8Mz47aTwxNz47aTwxOT47PjtsPHQ8dDw7cDxsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0PjtpPDU%2BOz47bDxwPDIwMDY7MjAwNj47cDwyMDA3OzIwMDc%2BO3A8MjAwODsyMDA4PjtwPDIwMDk7MjAwOT47cDwyMDEwOzIwMTA%2BO3A8MjAxMTsyMDExPjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VmlzaWJsZTs%2BO2w8bzx0Pjs%2BPjs%2BOzs%2BO3Q8QDA8cDxwPGw8Q3VycmVudFBhZ2VJbmRleDtQYWdlQ291bnQ7XyFJdGVtQ291bnQ7XyFEYXRhU291cmNlSXRlbUNvdW50O0RhdGFLZXlzOz47bDxpPDA%2BO2k8ND47aTwxMD47aTw0MD47bDw%2BOz4%2BOz47Ozs7Ozs7Ozs7PjtsPGk8MD47PjtsPHQ8O2w8aTwyPjtpPDM%2BO2k8ND47aTw1PjtpPDY%2BO2k8Nz47aTw4PjtpPDk%2BO2k8MTA%2BO2k8MTE%2BOz47bDx0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzNjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8xNjE3NzE0OSAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFNVTlRSVVNUIE1PUlRHQUdFIElOQzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TkFUSEFOSUVMIEdBQkJBUkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDEzNTQgVkFOREVSVkVFUiBBVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0MTU7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMTk2MzQ4ODUgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvMi8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxUSElSRCBGRURFUkFMIFNBVklOR1MgQU5EIExPQU4gQVNTTiBPRiBDTEVWRUxBTkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEdBWUxFIE5BU0g7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDg5NDEgQ09YIFJEIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTUwMztodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMjY1MTYxMiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS85LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFUgUyBCQU5LIE4gQTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TE9VSVMgTUlSTUFOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw2OTkxIEdBUlkgTEVFIERSIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQ5MjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMzk3NTc5MiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS82LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEZJRlRIIFRISVJEIE1PUlRHQUdFIENPOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxSQVlNT05EIFNURUlOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMzU5IFRIUlVTSCBBVkUgRkFJUkZJRUxELCBPSCA0NTAxNDs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDM4O2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI0NzgyOTYzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzMvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8V0VMTFMgRkFSR08gQkFOSyBOIEE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEpBTkVUIEJPRUhNOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw4NjA4IEdPTERGSU5DSCBXQVkgV0VTVCBDSEVTVEVSLCBPSCA0NTA2OTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDQwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI1NTkwMjAzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8RklGVEggVEhJUkQgQkFOSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VEhFT0RPUkUgQ09PSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8UE8gQk9YIDE3MTEgV0VTVCBDSEVTVEVSLCBPSCA0NTA3MTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDkwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI2ODY3MDkxICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzYvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Q0lUSUZJTkFOQ0lBTCBJTkM7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPERPTk5BIE1BUkRJUzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NjU0OSBDQU5BU1RPVEEgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0Njg7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMjk4NzU2MDIgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvNS8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxDSVRJTU9SVEdBR0UgSU5DOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxNQVRUSEVXIEJMVU5ERUxMOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxNDEyIEhFTE1BIEFWRSBIQU1JTFRPTiwgT0ggNDUwMTM7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzMjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8zMjI0MzYxNyAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFdFTExTIEZBUkdPIEJBTksgTiBBOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxKT0hOIEJPV01BTjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Jm5ic3BcOzs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDYzO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzQyMjcwMTE5ICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VSBTIEJBTksgTkFUSU9OQUwgQVNTT0NJQVRJT047Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEJSWUFOIFNDSE1JRFQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDI4OTUgV0VFUElORyBXSUxMT1cgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47Pj47Pj47Pj47Pj47Pj47PtVTse1TdIXrxq%2FXrY%2Fp22QQ7pAh&Search%3AddlMonth=5&Search%3AddlYear=2011&Search%3AtxtCompanyName=&Search%3AtxtLastName=&Search%3AtxtCaseNumber=';
## DO curl with cookies on again

Parsing FIND MY IPHONE?

As you probably know there's that think called "FIND MY IPHONE" ( www.me.com ), that enables you to see your phones location on a google map in real time.
Yet I was thinking if there would be any way to make a PHP file that would
1. run automatically every, say, 15 minutes.
2. pass cookies with my username/password to me.com
3. retrieve the lat/long coordinates from it and save them into a database.
that way I would have a history of my phone's location.
how would you suggest doing it, can a php file pass cookies or is there something else needed? and how to retrieve those lat/long coordinates that are already shown on a google map?
Thanks!
If there aren't any limitations to this service you might be interested in reading about php cURL. Link: http://php.net/manual/en/book.curl.php
You can use cookies and login just like a normal browser.
cUrl is the best solution for your request, you can find some useful stuff here
link text
I'm sure there is an app for that ;)

how do you take a snapshot of your current browser window using php

I've tried searching everywhere but there's seems to be no implementation available other than having the client use a file (batch/exe of some sort).
You just can't do it. PHP is server side scripting language, maybe you can do that using JavaScript, but I'm not even sure about that.
I know someone implemented such service, but actually he had to use Mozilla browser, which opened, a script (I think it was not JS, maybe perl, c/c++) made a screenshot and uploaded it.
I'm assuming you mean "your" in the general sense. If you mean "how does one take a screenshot...", you generally hit the print screen key. If you're trying to capture your users' browser output, I'd say that it's probably not possible. If it were, the best you could get is the output of what you wrote yourself.
Google Gears might be hackable to do something close, if you can simulate the print screen key press with JS and get the file to save somewhere gears can access.
You can't do that in PHP, as PHP is running on the server, and not the client.
To get screenshots of the browser, you can take a look at, for instance, this list.
If you are look for an automated solution to take screenshot of web pages opened in a browser window, you could also look at this question : How to capture x screen using PHP, shell_exe and scrot and it's answers.
And, finally, and without selecting any particular post, you can try a search on SO ; something like screenshot browser, sorted by relevance, seems to get some interesting posts :-)
Good luck !

How to invoke js without displaying the code

I was wondering, I want to plant a JS tracking code (analytics) in a few websites to track their traffic. But I don't want that when viewing the site's source code people will be able to see that I've embedded the JS tracking code there.
Is it possible? Maybe by using an Apache/PHP trick?
Thanks,
Roy.
Nope it's not possible, for the browser to execute any code at least some of it must be initially visible, even if that code is to then retrieve the tracking code itself.
In addition all the modern web developer tools provide access to any code that is loaded so anyone can use those to see anything you've attempted to load discretely.
The more important question is why you want to hide that you're tracking people?
It's not absolutely possible with Javascript. Javascript always runs in the context of the user's browser, so it always means that the user will have access to see the script. You can obfuscate it, or try some tricks similar to anti-hotlinking on the JS code, but it will still be relatively easy for someone to figure out what the code does with a simple tool like Firebug.
You can, however, track your traffic without JavaScript. Analytics uses JavaScript for portability, and because some of the data it accesses can only be accessed with JavaScript. However, there are more passive ways of tracking your traffic which don't require JavaScript, such as any log analyzer like AWstats. You just don't get some of the cool features of Analytics.
It's not possible, but you could just name your script file something innocent like "mouseover.js".
It's not possible: JS code has to be run by the web browser, which means that -- which ever way you try put it -- it has to be readable by the browser and thus by anyone that inspects the page.
You could try obfuscating the JS, but that won't stop anyone that is determined to see what's happening.
You could ask yourself what the odds are that more than a few people will check whether you're tracking them -- I wouldn't expect it.
You can't technically hide the code... But you can scramble it so it's not readable to anybody. I used http://hivelogic.com/enkoder/form by Dan Benjamin to scrable some JS on my page (in this case I scrambled my email address). It scrambles it so the browser can execute it, but it's not humanly readable...
Then you can just call it as a function like I did in from this script http://www.jamischarles.com/css_js/email_encoder.js. Give it a try.

Categories