Lately i have been working on php-Browser-alike program. The goal of this program is to use this php-browser platform to browse only 'safe' web sites. the capabilities will be to track an adult site and not displaying it.
unfortunately , there are two major problems:
Cookies - user can't log-in their users in different sites while using this platform.
Security redirecting - some sites check the url either in PHP or JS and then redirect to their page.
So , simply i though about plain B:
i was thinking about using iFrame and build the whole program in JavaScript and Ajax! but unfortunately , iFrame is super secured and i can't touch anything in it!
- and there is gone plain B.
My question is: is there anything you can think of / advices that can help building PHP/javascript+ajax browser alike program?
For the PHP side you'll need to use curl. You'd probably want to change the html on the server side. Take a look at this Is there a PHP HTML tag library?.
For checking if the site is adult. You should just pass the domain through a database of adult sites.
For javascript I don't know of any pre-made browsers. You'll probably have to block it in yourself, it shouldn't be to hard.
Update
basic structure:
js client makes ajax request to php server using GET or POSt (ex "url=site.com/page/foo.html")
Php gets url using GET or POST
php uses curl to get page contents
php parses through html and changes urls or js prevent link press and send href="" to server via ajax (back to top) : Is it possible to stop redirection to another link page?
php echos out the page
javascript places it in display
I know my ans is too late, posting for so that anyone get help. There is a simple solution for creating complete php browser. Here is the link: http://sourceforge.net/projects/snoopy/
Related
Im thinking in a cron which executes daily one php script.
This script will make a file_get_contents() to one url I assign.
Can i do this for simulate a user's visit?
Does it work like a visit?
$page = file_get_contents('http://www.example.com/');
echo $page;
You can "simulate" this kind of action, but it's better to be done with curl. Also to do this I would recommend going through this stackoverflow post, it explains all the variables which are needed to be supplied by doing a server side request, rather than opening the page through a browser and loading the analytics js.
If your visit is counted by Google Analytics implemented through javascript, then no, it won't work - file_get_contents() doesn't run any javascript, just downloads the file. However, you could do it by sending page view through PHP: https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide
Note that javascript gathers more information about the user than PHP can, so use with caution.
If your intention is to check whether the script was run, there are better and easier ways, such as logging any opening, or from specific IP, to your database or a text file.
Everything I suggested here presumes you can edit the opened file - your http://www.example.com/index.php. If you need to trigger Google Analytics on a site you can't edit, you need something way more robust, like a web crawler or scraper, to execute the javascript. For inspiration: http://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/
I used cURL to login into a website. The natural question is how to perform clicks on buttons and than eventually logout. For example..javascript uses click() function. What does php use? Thanks for clues.
I am following the book on web scraping. In it the author logins into it's publishers website. The book is old and out of date. More over, it says nothing about logging out. This is the publisher: https://www.packtpub.com/
You can't click a button using PHP alone. PHP doesn't work like that. PHP can download the HTML of a webpage, but it can't perform actions like a browser can.
If you want to do that, you will need a headless browser. A headless browser is typically seen as an invisible browser. You can do most things a regular browser can do. There's PhantomJS, and CasperJS, for this.
There are also PHP libraries that use PhantomJS. For example PHP PhantomJS. Personally, I've never done this with PHP, but I do use PhantomJS and CasperJS on a regular basis.
Alternative to that, what you can do with PHP is parse the DOM for links, or buttons, and replicate the HTTP request that's made when clicking the links/buttons.
For example, if there's a link that goes to /contactus, you simply create a GET request to this page using cURL. The response will be the source code and/or headers.
I am currently working on a project that uses CasperJS, PHP and Redis to create a rather complex scraper/automation/analysis tool for a large social network.
As a side note, some sites rely heavily on JavaScript and using cURL may not be enough. You can get around this by parsing the JavaScript file/s, and some other advanced magic, but believe me you do not want to go down this route. Which is why I use CasperJS on occasions. It's slower, but that's all we've got at the moment.
As for the logging out ... delete your cookies file. Done.
I recently published a project that gives PHP access to a browser. Get it here: https://github.com/merlinthemagic/MTS, Under the hood is an instance of PhantomJS like others have suggested, this project just wraps the functionality.
After downloading and setup you would simply use the following code:
$myUrl = "http://www.example.com";
$windowObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow($myUrl);
//select the username input field, in this case it has id=username
$windowObj->mouseEventOnElement("[id=username]", 'leftclick');
//type your username
$windowObj->sendKeyPresses("yourUsername");
//select the password input field, in this case it has id=passwd
$windowObj->mouseEventOnElement("[id=passwd]", 'leftclick');
//type your password
$windowObj->sendKeyPresses("yourPassword");
//click on the login button, in this case it has id=login
$windowObj->mouseEventOnElement("[id=login]", 'leftclick');
//click on all the buttons you need with this function
$windowObj->clickElement("[id=someButtonId]");
$windowObj->clickElement("[id=someOtherButtonId]");
//if you want the DOM or maybe screenshot and any point run:
$dom = $windowObj->getDom();
$imageData = $windowObj->screenshot();
I'm developing a website, but stuck at some point, where i needed to detect outgoing links on my website, and either forbid the links, or accept that, i don't know how facebook is doing this, but they can do it through facebook.com/l.php that if the link is marked spam, users will get notified about it.
I don't know if that's a php or htaccess, it worked in php using the DOMDOCUMENT, but it's not a real solution for this.
This is not something that you solve on the Apache or .htaccess level. Basically, whenever you're outputting a link, check if it's external, and if it is, change the destination to your redirector.
The redirector can then just check the URL passed, and if it's marked as malicious, it can show a message, and if it's not, then it can either automatically redirect or display some kind of notice that you're leaving the website.
I'm not 100% sure how Facebook is implementing it, but what I would recommend is to use JQuery (or another javascript library) to rewrite all external links to a validating PHP script (e.g. Facebook's l.php script), w/ the intended url getting passed as a GET parameter.
Using JQuery, it might look like:
$('a[href]').each(function(){
var safe_href = 'http://yourdomain.com/yourscript.php?url='+$(this).attr('href');
$(this).attr('href', safe_href);
});
You can can then do a database lookup in yourscript.php based on the variable $_GET['url'], and redirect to that url if it's safe or display a message if it isn't.
l.php is a script that reads links via $_GET['u']. With the url in the your hand you decide where you want the client to be redirected.
So as it looks, you want the users to teach your application what is spam and what not. For that you will need a button "report spam" beside the url.
So I am very new to this concept.
So why not go headfirst :) Some things I don't understand;
What happens if js is disabled?
If using mysql databases (ie; checking forms and such) why not just use php?
To confirm what others have said, disabling Javascript will also disable the AJAX call. After all, AJAX stands for "Asynchronous Javascript and XML".
To address why you can't just use PHP, there are some things that just can't be done without it. PHP is great to load the page with the initial information, but after the page is loaded, it actually requires the page to be reloaded to load something else. AJAX allows you to get around this hassle.
For your example of form validation, AJAX can be used to validate the information while the person is filling it out. Otherwise, you are required to reload the page each time someone fills out another field in the form.
Another example is from a project that I have worked on. The form required a zip code and would load the appropriate city and county based on the inputted zip. Using strict PHP, I would need the client to download the entire zip table embedded in the HTML/JS (which would add another 100k at least to the download).
Using AJAX, I can get around this. The user can input the zip code, which triggers an AJAX call that downloads the few rows that I need (this will be less than a few hundred bytes, for comparison).
[Edit:] Also, a tip because you said that you were new to AJAX. If your dealing with some form of authentication (logging in, etc.), remember to validate the user on the AJAX pages themselves. Otherwise, tricky users will be able to access sensitive information for your database.
Ajax just adds to the user experience and allows a web application to feel more like a desktop application to users. So they can delete a record and stay on the same page without reloading, you just let the record disappear.
And remember to validate on the server-side, even if you validate on client-side. Your weakest at your client-side as someone can easily just submit the values straight to your script so ALWAYS check on the server-side and do client-side if you would like to add some nice effects etc.
But you will always need to keep in mind that there are people out there who have javascript disable be it a security policy or just because their paranoid. So when you don't have JS enabled you javascript and AJAX requests won't work. So while developing you will need to make sure that if javascript is not their to do the operation that the form is submitted just like a normal HTTP form, this will allow all those paranoid people to also use your application :D.
OR you could always just deny access to those who don't have Javascript enabled but that's not very nice ... So if you want to check if they have javascript enabled checkout - http://www.w3schools.com/TAGS/tag_noscript.asp - for a example.
AJAX is a Javascript client based technology. If js is disable it simply doesn't work.
Php is a server based technology.
In Php you write pages that are dinamically built by the server. Once built they are sent as html to the client.
Using javascript (and Ajax) you can call the server just to request some datas (hint: look at JSON) or just a little html snippet which is plugged in the current page directly by the browser without requesting a full refresh from the server.
With js and AJAX you can achieve a very rich client experience without reloading a full page every time.
I believe nothing will happen if js is disabled. You need js to grab the data.
If you want to use mysql databases, you can use js to access a php script, which can then return any data gathered from a database, rather than doing it in the page.
AJAX is a way for Javascript (client side) to access PHP/ASP/Whatever serverside language you are using. This means, that if you have an PHP script for getting some data from your MySQL database, and want to run that script when the user clicks some random button, AJAX can do that (async)m and you wont have to reload you page to execute the PHP script.
If Javascript is diabled, AJAX won't work.
I have a Google calendar embedded on a webpage, with events related to activities the site is organizing. Some calendar events have links that redirect the user to a page, within the same website, which has more information and the option to enroll in the event.
The problem however, is that since the end of last month, Google imposed a redirect notice that doesn't even automatically redirect. The links I create on events are changed by Google and, once a user clicks on a link, a new tab opens leading to a page with a redirect warning that the user must click. Since I am providing the users with a link to within the same website this is very inconvenient and makes no sense at all.
I'd want the users to be able to click a link on the calendar and go through to the webpage with the relevant data.
Do you guys know how I can go around this warning?
My thought process:
Initially, I thought of using JS to rewrite the links but since the calendar's iframe is in a different domain, the browser won't allow it due to XSS exploits (AFAIK).
I could build my own AJAX calendar and sync it with Google's using the API, but that's a hell of a lot of work because of stupid "feature" that makes no sense. I like Google's calendar and I'd like to use it.
The third thing that I though of was that, instead of having an iframe with the calendar I could use AJAX to fetch the entire code on the frame's url. Then I'd just rewrite the links on the that code with JS. Could this work?
I would be REALLY thankful for any help. This is driving me insane!
Using Jon Cram's input I created a php script that parses the code and makes the adjustments. However I could only get that working for the html version. No AJAX for me. =(
The same origin policy will prevent JavaScript served from your domain from interacting with data served from a different domain.
You are therefore right in saying that option 1 won't work.
The same origin policy also applies to option 3 as you have stated it. JavaScript served from your domain won't be able to make a direct HTTP request to whichever domain serves the calendar code.
You will need to acquire and modify the calendar code, neither of which can be achieved with JavaScript using today's most commonly used browsers. When FireFox 3.1 and IE8 are in common use and Google serves the correct HTTP Access Control headers this could be achieved with JavaScript alone.
To modify code served from another domain, you will need to utilise some form of server-side process.
A server-side script will be able to request the calendar code. The same script can then modify the code as needed and output it in whatever form you require.
If it is a private internal site you could install greasemonkey on all clients (if they use firefox) and make a short script that fixes the urls. That only works if the original url is contained within google's redirecturl though.
If I had this problem I wound change the calendar provider, that's probably the easiest solution. I did a google search and found Kiko, looks like they might have what you need?
Simply remove the "http://" part of the URL. I am not sure why this works but it does!