Many HTTP requests (API) Vs Everything in single php

Many HTTP requests (API) Vs Everything in single php - php

I need some advice on website design.
Lets take example of twitter for my question. Lets say I am making twitter. Now on the home_page.php ,I need both, Data about tweets (Tweet id , who tweeted , tweet time etc. etc) and Data about the user( userId , username , user profile pic).
Now to display all this, I have two option in mind..
1) Making separate php files like tweets.php and userDetails.php. By using AJAX queries, I can get the data on the home_page.php.
2) Adding all the php code (connecting to db, fetching data ) in the home_page.php itself.
In option one, I need to make many HTTP requests, which (i think) will be load to the network. So it might slow down the website.
But option two, I will have a defined REST API. Which will be good of adding more features in the future.
Please give me some advice on picking the best. Also I am still a learner, so if there are more options of implementing this, please share.

In number 1 you're reliant on java-script which doesn't follow progressive enhancement or graceful degradation; if a user doesn't have JS they will see zero content which is obviously bad.
Split your code into manageable php files to make it easier to read and require them all in one main php file; this wont take any extra http requests because all the includes are done server side and 1 page is sent back.
You can add additional javascript to grab more "tweets" like twitter does, but dont make the main functionality rely on javascript.

Don't think of PHP applications as a collection of PHP files that map to different URLs. A single PHP file should handle all your requests and include functionality as needed.
In network programming, it's usually good to minimize the number of network requests, because each request introduces an overhead beyond the time it takes for the raw data to be transmitted (due to protocol-specific information being transmitted and the time it takes to establish a connection for example).
Don't rely on JavaScript. JavaScript can be used for usability enhancements, but must not be used to provide essential functionality of your application.

Adding to Kiee's answer:
It can also depend on the size of your content. If your tweets and user info is very large, the response the single PHP file will take considerable time to prepare and deliver. Then you should go for a "minimal viable response" (i.e. last 10 tweets + 10 most popular users, or similar).
But what you definitely will have to do: create an API to bring your page to life. No matter which approach you will use...

Related

How to efficiently construct HTML templates when all the datas come from an internal API?

Here's the context : we're actually using the basic web stack, and our website builds HTML templates with datas it gets from the database directly.
For tons of reasons, we're splitting this into two projects, one will be responsible for talking with the database directly, the other one will be responsible for displaying the datas.
To make it simple, one is the API, the other one is the client.
Now we're wondering about how we should ask our API for datas. To us, there are 2 totally different options :
One request, one route, for one page. So we would get a huge object to use which would contain everything needed to build the corresponding page.
One request for one little chunk of data. For example on a listing page, we'd make one request to get datas about the current logged user and display its name along with its avatar, then another request to get every articles, another request to get datas about the current page category...
Some like the first option, I don't like it at all. I feel like we're going to have a lot of redundance. I'm also not sure one huge request is that much faster than X tiny requests. I also don't like binding data to a specific page, as I feel like the API should be (somewhat) independant from our front website.
Some also don't like the second option, they fear we overcharge the server by making too many calls, and I can understand this fear. It also looks like it'll be hard to properly define the scope of what to send, what to not send without any redundancy. If we're sending only what's needed to display a page, isn't that the first option in the end ? But isn't sending unneeded information a waste ?
What do you guys think ?

The first approach will be good if getting all data is fast enough. The less requests - the faster app. Redundancy I think you mean code redundancy because sending the same amount of data in one request will be definitely faster than in 10 small non-parallel ones (network overhead). If you send a few parallel requests from UI you can get performance gain of cause. And you should take into account that browsers have some limitations for parallel requests.
Another case if getting some data is fast but another is slow you can return the first data and on UI show loading image and load the second data when it will come. It will improve user experience showing the page as fast as possible.
The second approach is more flexible as you can use some requests from other pages. But it comes with price - logic with making these requests (gathering information) you need to move to UI code making it more complex. And if you need the same data on another app like mobile you have to copy this logic. As a rule creating such code on backend side is easier.
You can also take a look at this pattern which allow you to locate business/domain logic inside one service and “frontend friendly” logic to another service (orchistration service).

When POSTing via AJAX / jQuery, one big request or a lot of little ones?

Alright I'm going to create a fairly complex form to post via AJAX a lot of different types of information PHP page which will then parse the data and CURL the various datatypes into the correct tables in another database.
Usually I just send a HUGE POST request and then parse the information in the PHP page, making multiple CURL requests along the way to post the various elements.
The downside is the form isn't super responsive, it generally takes 5-10 seconds for the PHP page to give the a-ok. I'd like this new form to be more snappy for better data entry, allowing the user to move onto the next entry without a hitch.
So just looking for some professional advice: Make 5 AJAX requests to 5 PHP pages or stick with the large load?
In terms of scale, there generally wont be many users it at the same time (it's internal for an organization), I'm trying to optimize for a single person submitting many entries over and over again with the same form.
Any and all advice is very welcome and appreciated.

PHP page which will then parse the data and CURL the various datatypes into the correct tables in another database.
Are you making an HTTP Request then? If yes, you should stick to one big AJAX call because otherwise you'd have to establish an HTTP Connection 5 times and establishing a connection takes time!

On a webpage, where there are various fields, I tend to find it best to just send the data over as the users enter it, and jquery makes it easy with the event handling, so that if there is a problem the users can be informed quickly.
But, if the data must be processed as a group, for example, you have a 100x100 matrix and only when it is filled in can you do the mathematics, then one large post works.
So, it depends, if you can't validate until you have all the related information, for example, you don't know if an address is valid until you have street, city, state, then wait until all three are entered then submit the information.
Without more information as what is being done it is hard to really answer the question, but, as a rule of thumb, submit the smallest amount of information that is useful to give the fastest response back to the user if there is an error.

One thing to keep in mind: If you're using file-based sessions and those "5 posts" are all handled by the same site/server, you won't be able to have those 5 POSTs running in parallel. PHP will slap exclusive locks on the session file for each request, so they'll effectively be processed in a serial manner rather than parallel.
Unless you do a session_write_close() in each one, you'd be better off doing one big POST instead and save the extra overhead of establishing/tearing down 5 connections.

How would you protect a database of links from being scraped?

I have a large database of links, which are all sorted in specific ways and are attached to other information, which is valuable (to some people).
Currently my setup (which seems to work) simply calls a php file like link.php?id=123, it logs the request with a timestamp into the DB. Before it spits out the link, it checks how many requests were made from that IP in the last 5 minutes. If its greater than x, it redirects you to a captcha page.
That all works fine and dandy, but the site has been getting really popular (as well as been getting DDOsed for about 6 weeks), so php has been getting floored, so Im trying to minimize the times I have to hit up php to do something. I wanted to show links in plain text instead of thru link.php?id= and have an onclick function to simply add 1 to the view count. Im still hitting up php, but at least if it lags, it does so in the background, and the user can see the link they requested right away.
Problem is, that makes the site REALLY scrapable. Is there anything I can do to prevent this, but still not rely on php to do the check before spitting out the link?

It seems that the bottleneck is at the database. Each request performs an insert (logs the request), then a select (determine the number of requests from the IP in the last 5 minutes), and then whatever database operations are necessary to perform the core function of the application.
Consider maintaining the request throttling data (IP, request time) in server memory rather than burdening the database. Two solutions are memcache (http://www.php.net/manual/en/book.memcache.php) and memcached (http://php.net/manual/en/book.memcached.php).
As others have noted, ensure that indexes exist for whatever keys are queried (fields such as the link id). If indexes are in place and the database still suffers from the load, try an HTTP accelerator such as Varnish (http://varnish-cache.org/).

You could do the ip throttling at the web server level. Maybe a module exists for your webserver, or as an example, using apache you can write your own rewritemap and have it consult a daemon program so you can do more complex things. Have the daemon program query a memory database. It will be fast.

Check your database. Are you indexing everything properly? A table with this many entries will get big very fast and slow things down. You might also want to run a nightly process that deletes entries older than 1 hour etc.
If none of this works, you are looking at upgrading/load balancing your server. Linking directly to the pages will only buy you so much time before you have to upgrade anyway.

Every thing you do on the client side can't be protected, Why not just use AJAX ?
Have a onClick event that call's an ajax function, that returns just the link and fill it in a DIV on your page, beacause the size of the request an answer is small, it will work fast enougth for what you need. Just make sure in the function you call to check the timestamp, It is easy to make a script that call that function many times to steel you links.
You can check out jQuery, or other AJAX libraries (i use jQuery and sAjax). And I have lots of page that dinamicly change content very fast, The client doesn't even know is not pure JS.

Most scrapers just analyze static HTML so encode your links and then decode them dynamically in the client's web browser with JavaScript.
Determined scrapers can still get around this, but they can get around any technique if the data is valuable enough.

Top techniques to avoid 'data scraping' from a website database

I am setting up a site using PHP and MySQL that is essentially just a web front-end to an existing database. Understandably my client is very keen to prevent anyone from being able to make a copy of the data in the database yet at the same time wants everything publicly available and even a "view all" link to display every record in the db.
Whilst I have put everything in place to prevent attacks such as SQL injection attacks, there is nothing to prevent anyone from viewing all the records as html and running some sort of script to parse this data back into another database. Even if I was to remove the "view all" link, someone could still, in theory, use an automated process to go through each record one by one and compile these into a new database, essentially pinching all the information.
Does anyone have any good tactics for preventing or even just detering this that they could share.

While there's nothing to stop a determined person from scraping publically available content, you can do a few basic things to mitigate the client's concerns:
Rate limit by user account, IP address, user agent, etc... - this means you restrict the amount of data a particular user group can download in a certain period of time. If you detect a large amount of data being transferred, you shut down the account or IP address.
Require JavaScript - to ensure the client has some resemblance of an interactive browser, rather than a barebones spider...
RIA - make your data available through a Rich Internet Application interface. JavaScript-based grids include ExtJs, YUI, Dojo, etc. Richer environments include Flash and Silverlight as 1kevgriff mentions.
Encode data as images. This is pretty intrusive to regular users, but you could encode some of your data tables or values as images instead of text, which would defeat most text parsers, but isn't foolproof of course.
robots.txt - to deny obvious web spiders, known robot user agents.
User-agent: *
Disallow: /
Use robot metatags. This would stop conforming spiders. This will prevent Google from indexing you for instance:
<meta name="robots" content="noindex,follow,noarchive">
There are different levels of deterrence and the first option is probably the least intrusive.

If the data is published, it's visible and accessible to everyone on the Internet. This includes the people you want to see it and the people you don't.
You can't have it both ways. You can make it so that data can only be visible with an account, and people will make accounts to slurp the data. You can make it so that the data can only be visible from approved IP addresses, and people will go through the steps to acquire approval before slurping it.
Yes, you can make it hard to get, but if you want it to be convenient for typical users you need to make it convenient for malicious ones as well.

There are few ways you can do it, although none are ideal.
Present the data as an image instead of HTML. This requires extra processing on the server side, but wouldn't be hard with the graphics libs in PHP. Alternatively, you could do this just for requests over a certain size (i.e. all).
Load a page shell, then retrieve the data through an AJAX call and insert it into the DOM. Use sessions to set a hash that must be passed back with the AJAX call as verification. The hash would only be valid for a certain length of time (i.e. 10 seconds). This is really just adding an extra step someone would have to jump through to get the data, but would prevent simple page scraping.

Try using Flash or Silverlight for your frontend.
While this can't stop someone if they're really determined, it would be more difficult. If you're loading your data through services, you can always use a secure connection to prevent middleman scraping.

force a reCAPTCHA every 10 page loads for each unique IP

There is really nothing you can do. You can try to look for an automated process going through your site, but they will win in the end.
Rule of thumb: If you want to keep something to yourself, keep it off the Internet.

Take your hands away from the keyboard and ask your client the reason why he wants the data to be visible but not be able to be scraped?
He's asking for two incongruent things and maybe having a discussion as to his reasoning will yield some fruit.
It may be that he really doesn't want it publicly accessible and you need to add authentication / authorization. Or he may decide that there is value in actually opening up an API. But you won't know until you ask.

I don't know why you'd deter this. The customer's offering the data.
Presumably they create value in some unique way that's not trivially reflected in the data.
Anyway.
You can check the browser, screen resolution and IP address to see if it's likely some kind of automated scraper.
Most things like cURL and wget -- unless carefully configured -- are pretty obviously not browsers.

Using something like Adobe Flex - a Flash application front end - would fix this.
Other than that, if you want it to be easy for users to access, it's easy for users to copy.

There's no easy solution for this. If the data is available publicly, then it can be scraped. The only thing you can do is make life more difficult for the scraper by making each entry slightly unique by adding/changing the HTML without affecting the layout. This would possibly make it more difficult for someone to harvest the data using regular expressions but it's still not a real solution and I would say that anyone determined enough would find a way to deal with it.
I would suggest telling your client that this is an unachievable task and getting on with the important parts of your work.

What about creating something akin to the bulletin board's troll protection... If a scrape is detected (perhaps a certain amount of accesses per minute from one IP, or a directed crawl that looks like a sitemap crawl), you can then start to present garbage data, like changing a couple of digits of the phone number or adding silly names to name fields.
Turn this off for google IPs!

Normally to screen-scrape a decent amount one has to make hundreds, thousands (and more) requests to your server. I suggest you read this related Stack Overflow question:
How do you stop scripters from slamming your website hundreds of times a second?

Use the fact that scrapers tend to load many pages in quick succession to detect scraping behaviours. Display a CAPTCHA for every n page loads over x seconds, and/or include an exponentially growing delay for each page load that becomes quite long when say tens of pages are being loaded each minute.
This way normal users will probably never see your CAPTCHA but scrapers will quickly hit the limit that forces them to solve CAPTCHAs.

My suggestion would be that this is illegal anyways so at least you have legal recourse if someone does scrape the website. So maybe the best thing to do would just to include a link to the original site and let people scrape away. The more they scrape the more of your links will appear around the Internet building up your pagerank more and more.
People who scrape usually aren't opposed to including a link to the original site since it builds a sort of rapport with the original author.
So my advice is to ask your boss whether this could actually be the best thing possible for the website's health.

When is it appropriate to use AJAX?

When is it appropriate to use AJAX?
what are the pros and cons of using AJAX?
In response to my last question: some people seemed very adamant that I should only use AJAX if the situation was appropriate:
Should I add AJAX logic to my PHP classes/scripts?
In response to Chad Birch's answer:
Yes, I'm referring to when developing a "standard" site that would employ AJAX for its benefits, and wouldn't be crippled by its application. Using AJAX in a way that would kill search rankings would not be acceptable. So if "keeping the site intact" requires more work, than that would be a "con".

It's a pretty large subject, but you should be using AJAX to enhance the user experience, without making the site totally dependent on it. Remember that search engines and some other visitors won't be able to execute the AJAX, so if you rely on it to load your content, that will not work in your favor.
For example, you might think that it would be nice to have users visit your blog, and then have the page dynamically load the newest article(s) with AJAX once they're already there. However, when Google tries to index your blog, it's just going to get the blank site.
A good search term to find resources related to this subject is "progressive enhancement". There's plenty of good stuff out there, spend some time following the links around. Here's one to start you off:
http://www.alistapart.com/articles/progressiveenhancementwithjavascript/

When you are only updating part of a page or perhaps performing an action that doesn't update the page at all AJAX can be a very good tool. It's much more lightweight than an entire page refresh for something like this. Conversely, if your entire page reloads or you change to a different view, you really should just link (or post) to the new page rather than download it via AJAX and replace the entire contents.
One downside to using AJAX is that it requires javascript to be working OR you to construct your view in such a way that the UI still works without it. This is more complicated than doing it just via normal links/posts.

AJAX is usually used to perform an HTTP request while the page is already loaded (without loading another page).
The most common use is to update part of the view. Note that this does not include refreshing the whole view since you could just navigate to a new page.
Another common use is to submit forms. In all cases, but especially for forms, it is important to have good ways of handling browsers that do not have javascript or where it is disabled.

I think the advantage of using ajax technologies isn't only for creating better user-experiences, the ability to make server calls for only specific data is a huge performance benefit.
Imagine having a huge bandwidth eater site (like stackoverflow), most of the navigation done by users is done through page reloads, and data that is continuously sent over HTTP.
Of course caching and other techniques help this bandwidth over-head problem, but personally I think that sending huge chunks of HTML everytime is really a waste.
Cons are SEO (which doesn't work with highly based ajax sites) and people that have JavaScript disabled.

When your application (or your users) demand a richer user experience than a traditional webpage is able to provide.

Ajax gives you two big things:
Responsiveness - you can update only parts of a web page at a time if need be (saving the time to re-load a page). It also makes it easier to page data that is presented in a table for instance.
User Experience - This goes along with responsiveness. With AJAX you can add animations, cooler popups and special effects to give your web pages a newer, cleaner and cooler look and feel. If no one thinks this is important then look to the iPhone. User Experience draws people into an application and make them want to use it, one of the key steps in ensuring an application's success.
For a good case study, look at this site. AJAX effects like animating your new Answer when posted, popups to tell you you can't do certain things and hints that new answers have been posted since you started your own answer are all part of drawing people into this site and making it successful.

Javascript should always just be an addition to the functionality of your website. You should be able to use and navigate the site without any Javascript involved. You can use Javascript as an addition to existing functionality, for example to avoid full-page reloads. This is an important factor for accessibility. Javascript should never be used as the only possibility to reach or complete a request on your site.
As AJAX makes use of Javascript, the same applies here.

Ajax is primarily used when you want to reload part of a page without reposting all the information to the server.
Cons:
More complicated than doing a normal post (working with different browsers, writing server side code to hadle partial postbacks)
Introduces potential security vulnerabilities (
You are introducing additional code that interacts with the server. This can be a problem on both the client and server.
On the client, you need ways of sending and receiving responses. It's another way of interacting with the browser which means there is another point of entry that has to be guarded. Executing arbritary code, posting data to a non-intended source etc. There are several exploits for Ajax apps that have been plugged over time, but there will always be more.
)
Pros:
It looks flashier to end users
Allows a lot of information to be displayed on the page without having to load all at the same time
Page is more interactive.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.