A couple of years ago, before I knew about Stack Overflow, I was working in an office with a lot of competition between the programmers. There, I had to code a web page in PHP with Drupal, that needed to get data from another site by RSS. What happened was that there was no way to get the data beforehand: the data depended on the content of the page which itself was dynamic, so the page stopped loading for a couple of seconds while PHP went to get the RSS data. That was bad. The page depended on a couple of parameters out of a huge list. So fetching all possible combinations in davance was out of the questions. It was some sort of search page, that included the results of a sister site, I think.
The first thing I did to improve that was to set up a caching system. When the page was loaded, it launched a Javascript method that saved the RSS data back into the database for this specific page, using AJAX. That meant that if the same page was requested again, the old data would be sent immediately. and the AJAX script would get the cache updated with the new data, if needed. The Javascript pretty much opened a hidden page on the site with a GET instruction that matched the current page's parameters. It's only a couple of days later that I realised that I could have cached the data without the AJAx. (Trust me, it's easier to spot in hindsight.) But that's not the issue I'm asking about.
But I was told not to do any caching at all. I was told that my AJAX page "exposed the API". That a malicious user could hit the hidden page again and again to do a Denial of Service attack. I thought my AJAX was a temporary solution anyway, but that caching was needed. But mostly: wasn't the DoS argument true of ANY page on the site? Did the fact that my hidden page did not appear in the menus and returned no content make it worse?
As I said, there was a lot of competition between programmers, so the people around me, who were unanimous, might have been right, or they may have tried to stop me from doing something that was bad because they were not the ones doing it. (It happened a lot.) But I'm still curious. I was fully aware that my AJAX thing was a hack. I wanted to change that system as soon as I found something better, but I thought that no caching at all was even worse. Which was true? Doesn't, by that logic, ALL AJAX expose the API? If we look past the fact that my AJAX was an ugly hack, was it really that dangerous?
I'll admit again and again that it was an ugly, temporary fix, but my question is about having a "hidden" page that returns no content that makes the server do something. How horrible is that?
both sides are right. Yes, it does "expose" the api, but ajax requests can only access publicly accessible documents/scripts in the first place, so yes, all ajax requests "expose" their target script in the same way. DoS attacks are not script specific, they are server specific, so one can perform a DoS using anything pointing to the server, not just this script your ajax calls. I would tell your buddies their argument is weak and grasping at straws, and don't be jealous :P
If I read your post correctly, it seems as if the AJAX requested version of the page would know to invalidate the cache each time?
If that's the case, then I suppose your co-worker might have been saying that the hidden page would be susceptible to a DDOS attack in a way that the full pageload wasn't. I.E. The full pageload would get a cached version on each pageload after the first, where as the AJAX version would get fresh content each time. If that's the case, then s/he's right.
By "expose the API", your co-worker was saying that you were exposing the URL of a page that was doing work that should be done in the background. The outside world should not know about a URL whose sole purpose is to do some heavy lifting task. As you even said, you found a backend solution that didn't require the user's browser knowing about your worker process at all.
Yes, having no cache at all when the page relies on heavy content is worse than having an ajax version of the page do the caching, but I think the warning from your coworker was that no page, EVEN if it's AJAX, should have the power to break the cache in a way you didn't expect or intend.
The only way this would be a problem is said "hidden page that returns no content that makes the server do something" had different authentication scheme or permissioning from the rest of the pages, or if what it made the back-end do would be inordinately heavy compared to any other page on the site that posted something.
Related
Sorry for that, but i really concluded with the decision that it's better to ask directly than browsing tons of pages in vain.
I've already looked through enough resources, but haven't found a decent explanation that could fulfill my curiosity about simplest question.
Assume there’s a URI located at – hhtp://example.com/example (including php script, queries to the database).
Let’s imagine I’ve loaded it in my browser, click some link and hit “back” to return to hhtp://example.com/example
As far as I can allow myself to understand about what happens behind the scene looks smth like this:
After “back” been clicked there browser checks its cache specifically hhtp://example.com/example which matches exactly to the requested file (after “back”) and finds out that it wasn’t changed within this short period of time since it was first time loaded and returns it from its cache.
Wait!!!!
The file contains server side scripts, database queries and so forth.
So it should again reach web server, request same data from mysql and output it in a file.
So what’s the best strategy to cache dynamic content preferably on client-side vs server-side?
In which cases it makes useful to cache content at server-side, and what practice is the best?
Please can someone provide some resources covering this subject that can be conceived by such dumpers like my and refute or adjust the scheme above about what actually happens.
While browsing the issue i run into one service - http://gtmetrix.com/ I liked very much,
There were smth mentioned about making ajax request cacheable – I may assume that it can be perfectly used for client-side caching of dynamic content retrieved from database. Can someone please acknowledge it or deprecate?
Looking at other questions, none seem to be specific enough for my case. I am creating a blog-like website and have created a user-authenticated page that allows me to add a title, main content, and image using a form that is sent to a php file that stores the data in the mySQL database. Then, the php redirects me to the index page where I want to load the latest blog post along with all previous posts and put them in a div with styling. I do not think I need AJAX for this. I only need the data to load per each visit, therefore, is the best thing for me to do is to call a javascript function on $(document).ready() that will access the data in the database? If so, how can I implement PHP in my javascript to work with the database and then store the info in javascript variables.
Thanks
You certainly don't need AJAX for this. Just use your index.php page to look up the blog posts in the db and then loop through and echo them out.
You could later incorporate AJAX to call a php page which would provide the blog posts to you. This way you could update the page after it is loaded when new blog posts are created.
Hope that helps.
Clearly you do not seem to have a firm grasp on the role of each language. If you're building something on your own follow the advice of someone who has built ALL of it and can show it off to boot (see my profile, my site contains a blog, forums, private messaging, chat room, CMS, etc all built entirely by myself using no one else's code).
(X)HTML - It's the noun language, an image, a paragraph, a division element used by CSS to style the page, etc.
CSS - The adjective language, describes how the (X)HTML noun language is displayed.
JavaScript - The verb language, event driven; when the user does (onmouseover, onclick, onload, etc) action execute this code (usually a function). AJAX is simply loading content after the page has finished loading. You can worry about the fancy stuff once you have your basics working.
PHP - Server side language, prepares code (mostly XHTML) to be sent to the client's computer.
Database - Where your content is stored.
"$(document).ready() " is not JavaScript, that's jQuery. If you want to learn stay as far away from JavaScript libraries and learn REAL JavaScript otherwise you're going to run in to the nightmares associated with it (crap performance versus native JavaScript, updating libraries changes how you must code to them, etc). Feel free to look at my site's source code as it's all written for XHTML as application/xhtml+xml which means it WILL work in regular HTML though the vast majority of sites do NOT work if you switch them to XHTML. In other words when you code right the first time you'll have much more confidence that it will JUST WORK end of story. People don't care about how you did it but that it works and if it ALWAYS works then they simply can't get any happier with what they have.
If you're building the main blog page you simply need a single SQL query to pull all the content. My blog I programmed to display the last eight latest blog entries though with my pagination it's exactly like a book, the first page (on the left side) starts with the first eight, so if the count isn't divisible by eight you might see six entries on the latest page.
When using SQL you want to CONSTRUCT your query, NEVER stick it inside of a loop! The fewer queries you execute the better your code is and the better your performance. I recommend downloading MySQL Workbench and setting up a MySQL query log and then use Tail for Win32 to view queries in real time to see what your code is doing.
Apache also has logs. You are building this LOCALLY at http:// localhost/ first correct? You should never test something live until you've exhausted testing it locally first. See my base element blog entry about how to best do that...
http://www.jabcreations.com/blog/streamlining-local-and-live-development-with-the-base-element
If you're talking about redirects keep the technical stuff hidden from users and take advantage of $_SESSIONS in PHP. Record what the current page URL is (relative to the base which is different for local/local network/live environments), have a second-URL to fall back to and if that too matches the redirect page then have a safe-URL that is statically defined. If you're constantly falling back to the static URL then check to make sure you haven't goofed up how your other two variables are being updated each page load (such as not updating it if you're on the redirect page obviously).
When you solidify your basic understanding you will want to ask very specific questions as your question is wildly subjective and to most programmers not really worth answering. Make sure you use correct terminology, stick to core languages and not libraries as doing so will help ensure your working code will last that much longer. The stricter your coding practices the better off you'll be. Maximize the sensitivity of your error reporting for HTTP, JavaScript, PHP and SQL errors. Getting PHP is not set errors? What if a hacker is trying to pry error messages from your code? Make sure those variables are set before you even begin to start working with them. Log your errors and fix them fanatically. Don't try to add every feature in the world, concentrate on the critical functionality first and make sure it's incontestably solid before you expand upon it. Do these things and while it may take more time up front you will be rocking harder than the vast majority of people drowning in live environments that are built on not solid code.
gizmovation is right, you don't need AJAX, but to answer the question "how can I implement PHP in my javascript to..."
You're looking to use AJAX. Use jQuery's .ajax to call the PHP page, and when it returns the result, put it into the javascript variables, or directly into the DOM. AJAX example or jQuery example
I've done web scraping before but it was never this complex. I want to grab course information from a school website. However all the course information is displayed in a web scraper's nightmare.
First off, when you click the "Schedule of Classes" url, it directs you through several other pages first (I believe to set cookies and check other crap).
Then it finally loads a page with an iframe that apparently only likes to load when it's loaded from within the institution's webpage (ie arizona.edu).
From there the form submissions have to be made via buttons that don't actually reload the page but merely submit a AJAX query and I think it just manipulates the iframe.
This query is particularly hard for me to replicate. I've been using PHP and curl to simulate a browser visiting the initial page, gather's the proper cookies and such. But I think I have a problem with the headers that my curl function is sending because it never lets me execute any sort of query after the initial "search form" loads.
Any help would be awesome...
http://www.arizona.edu/students/registering-classes -> "Schedule of Classes"
Or just here:
http://schedule.arizona.edu/
If you need to scrape a site with heavy JS / AJAX usage - you need something more powerful than php ;)
First - it must be full browser with capability to execute JS, and second - there must be some api for auto-browsing.
Assuming that you are a kid (who else would need to parse a school) - try Firefox with iMacros. If you are more seasoned veteran - look towards Selenium.
I used to scrap a lot of pages with JS, iframes and all kinds of that stuff. I used PhantomJS as a headless browser, that later I wrapped with PhantomCurl wrapper. The wrapper is a python script that can be run from command line or imported as a module
Are you sure you are allowed to scrape the site?
If yes, then they could just give you a simple REST api?
In rare case when they would allow you to get to the data, but would not provide API, my advice would be to install some software to record your HTTP interaction with web site, maybe wireshark, or some HTTP proxy, but it is important that you get all details of http requests recorded. After you have that, analyze it, and try to replay it up to the latest bit.
Among possible chores, it might be that at some point in time server sends you generated javascript, that needs to be executed by the client browser in order to get to the next step. In this case you would need to figure how to parse received javascript, and figure out how to move next.
An also good idea would be not to fire all your http requests in burst mode, put put some random delays so that it appears to the server more "human" like.
But in the end you need to figure out if all this is worth the trouble? Since almost any road block to scraping can be worked around, but it can get quite involved and time consuming.
We're in the first steps of what will be a AJAX-based webapp where information and generated HTML will be sent backwards and forwards with the help of JSON/POST techniques.
We're able to get the data out quickly without putting to much load on the database with the help of a cache-layer that features memcached as well as disc-based cache. Besides that - what's essential to have in mind when designing AJAX heavy webapps?
Thanks a lot,
Probably the best thing to have in mind is that your app shouldn't be AJAX-based. It should work fine if the user's browser has scripts disabled. Only then should you start layering on AJAX. Stackoverflow is a great example of this. AJAX really improves the experience but it works when it's disabled.
Another thing I like to do is to use the same PHP validation functions for both server-side and client-side validation (as in sending an AJAX request to a script containing the same PHP function) to keep the amount of cross-language code duplication to a minimum.
Read up on Degradable AJAX.
Security for one. JavaScript has a pretty notoriously bad security profile.
These are the two that always get me:
What happens when the user clicks multiple items that may trigger multiple requests that may return out of order?
What happens when a request doesn't come back for one reason or another (timeout, server problem, etc.)? It always happens eventually, and the more gracefully your system fails the better.
When is it appropriate to use AJAX?
what are the pros and cons of using AJAX?
In response to my last question: some people seemed very adamant that I should only use AJAX if the situation was appropriate:
Should I add AJAX logic to my PHP classes/scripts?
In response to Chad Birch's answer:
Yes, I'm referring to when developing a "standard" site that would employ AJAX for its benefits, and wouldn't be crippled by its application. Using AJAX in a way that would kill search rankings would not be acceptable. So if "keeping the site intact" requires more work, than that would be a "con".
It's a pretty large subject, but you should be using AJAX to enhance the user experience, without making the site totally dependent on it. Remember that search engines and some other visitors won't be able to execute the AJAX, so if you rely on it to load your content, that will not work in your favor.
For example, you might think that it would be nice to have users visit your blog, and then have the page dynamically load the newest article(s) with AJAX once they're already there. However, when Google tries to index your blog, it's just going to get the blank site.
A good search term to find resources related to this subject is "progressive enhancement". There's plenty of good stuff out there, spend some time following the links around. Here's one to start you off:
http://www.alistapart.com/articles/progressiveenhancementwithjavascript/
When you are only updating part of a page or perhaps performing an action that doesn't update the page at all AJAX can be a very good tool. It's much more lightweight than an entire page refresh for something like this. Conversely, if your entire page reloads or you change to a different view, you really should just link (or post) to the new page rather than download it via AJAX and replace the entire contents.
One downside to using AJAX is that it requires javascript to be working OR you to construct your view in such a way that the UI still works without it. This is more complicated than doing it just via normal links/posts.
AJAX is usually used to perform an HTTP request while the page is already loaded (without loading another page).
The most common use is to update part of the view. Note that this does not include refreshing the whole view since you could just navigate to a new page.
Another common use is to submit forms. In all cases, but especially for forms, it is important to have good ways of handling browsers that do not have javascript or where it is disabled.
I think the advantage of using ajax technologies isn't only for creating better user-experiences, the ability to make server calls for only specific data is a huge performance benefit.
Imagine having a huge bandwidth eater site (like stackoverflow), most of the navigation done by users is done through page reloads, and data that is continuously sent over HTTP.
Of course caching and other techniques help this bandwidth over-head problem, but personally I think that sending huge chunks of HTML everytime is really a waste.
Cons are SEO (which doesn't work with highly based ajax sites) and people that have JavaScript disabled.
When your application (or your users) demand a richer user experience than a traditional webpage is able to provide.
Ajax gives you two big things:
Responsiveness - you can update only parts of a web page at a time if need be (saving the time to re-load a page). It also makes it easier to page data that is presented in a table for instance.
User Experience - This goes along with responsiveness. With AJAX you can add animations, cooler popups and special effects to give your web pages a newer, cleaner and cooler look and feel. If no one thinks this is important then look to the iPhone. User Experience draws people into an application and make them want to use it, one of the key steps in ensuring an application's success.
For a good case study, look at this site. AJAX effects like animating your new Answer when posted, popups to tell you you can't do certain things and hints that new answers have been posted since you started your own answer are all part of drawing people into this site and making it successful.
Javascript should always just be an addition to the functionality of your website. You should be able to use and navigate the site without any Javascript involved. You can use Javascript as an addition to existing functionality, for example to avoid full-page reloads. This is an important factor for accessibility. Javascript should never be used as the only possibility to reach or complete a request on your site.
As AJAX makes use of Javascript, the same applies here.
Ajax is primarily used when you want to reload part of a page without reposting all the information to the server.
Cons:
More complicated than doing a normal post (working with different browsers, writing server side code to hadle partial postbacks)
Introduces potential security vulnerabilities (
You are introducing additional code that interacts with the server. This can be a problem on both the client and server.
On the client, you need ways of sending and receiving responses. It's another way of interacting with the browser which means there is another point of entry that has to be guarded. Executing arbritary code, posting data to a non-intended source etc. There are several exploits for Ajax apps that have been plugged over time, but there will always be more.
)
Pros:
It looks flashier to end users
Allows a lot of information to be displayed on the page without having to load all at the same time
Page is more interactive.