TLDR; Is it possible to get an approximation of whether user hardware is arbitrarily "good" or "bad" in order to serve them more or less resource intensive content using PHP NOT serving javascript first and relaying info?
I can't find anything online about it, but is there any way to get some basic information about the user's hardware using only the information PHP has access to? i.e. request headers likes user-agent string etc?
I don't mean detailed information like RAM, HDD Capacity, CPU, GPU etc, just an approximation to plug into a boolean value $good_hardware = true for example
Why? Javascript and CSS effects and animations can massively improve user engagement if not overused and properly placed, but some effects and scripts in particular can be extremely resource intensive.
It would nice to be able to have an idea of the user's hardware so that users with high performance machines can benefit from the increased engagement, but users with low-end machines can be served slightly different content with effects/scripts simplified or turned off altogether in order to improve the page speed
I am NOT looking for javascript solutions, as this should work from the landing page and not delay the user's first engagement with the site
There is a Browser Capabilities Project for PHP that will give you all kinds of info about the Browser and a little bit about the machine. See PHP's get_browser().
These guys that say you can spoof the UA are not the sharpest knives in the drawer. 99.99% of the population does not know what a UA is. And, so what if someone does?
The bigger question is Why? Why do you want to rank visitors? That would give us some insight and possible have a solution. Basically you have our hands tied.
Obviously javascript is the correct tool for this job. Why no javascript? And I'm not talking extensive analyzation, based on javascript's ability to tell you about the visitor's screen size, and etc.
It's very simple. At the top of the HTML you have a simple javascript that simply stores the time. At onload, you write the time again. The difference between the two times gives you an excellent indicator of the user's machine performance. But for whatever reason you want to avoid Javascript, you can get some very limited info from the user agent. You can get the OS, whether the machine is 16, 32, or 64 bit, and mobile or desktop. I cannot help you because I have no idea why you are doing this.
Related
i'm creating a simple browser game with online transactions, but i'm thinking... "How can i guarantee that my site won't get down with too many players accessing it?"
I'm asking because i'll pay digital influencers to do the marketing, so i suppose many people will access it...
I should contract a VPN and run backend with node.js or pure PHP will do a good job to hold the site?
Site stability has a lot of different factors. Two main points to consider:
If your site is static HTML and JS files, using a CDN like Cloudflare will provide very strong protection against the site ever going down.
Assuming there's a heavier lift than static files (like DB calls and server-side processing), this ultimately comes down to two factors:
The specs of your server (e.g. ram, CPUs)
The efficiency of your code
Books can be written about how hardware and code can be improved. Ultimately releasing it in the wild will show you how they handle the load. Great monitoring software (like AppOptics) can give you insights into when you're getting close to any limits and need to upgrade hardware or optimize code.
Practically speaking, if you're not expecting a giant load on day one (which, unless you have a fantastic marketing channel or a lot of followers, you likely won't have), you should be more concerned with building something of value than optimizing it. Optimizing comes later.
I wish to test, like many other's I'm sure, "how many simultaneous requests can my web server handle".
By using tools like ab or siege, and hitting your apache web server / mysql database / php script with queries that represent real-life usage, how representative are the results you are getting back compared to what would be real-life usage by actual users?
I mean, for instance, testing with a utility, all the traffic comes from a single IP, while actual usage comes from many different IP addresses? Does this account for a world of difference?
If ab says my web server can handle 1000 requests per second, is this directly transferable to saying that the web server would handle 1000 requests per second from actual users?
I know this is a fluffy area, so the more concrete and direct replies I can get, the better. The old "it depends" won't help much :)
Sorry, but "it depends" is the best answer here.
Firstly, the most valuable tool in answering this question is not ab or siege or JMeter (my favourite open source tool), it's a spreadsheet.
The number of requests your system can handle is determined by which bottleneck you hit first. Some of those bottlenecks will be hardware/infrastructure (bandwidth, CPU, the effectiveness of your load balancing scheme), some will be "off the shelf" software and the way it's configured (Apache's ability to serve static files, for instance), and software (how efficiently your PHP scripts and database queries run). Some of the bottleneck resources may not be under your control - most sites hosted in Europe or the US are slow when accessed from China, for instance.
I've used a spreadsheet to model user journeys - this depends entirely on your particular case, but a user journey might be:
visit homepage
click "register/log in" link
register as new user
click "verify" link from email
access restricted content
Most sites support many user journeys - and at any one time, the mixture between those user journeys is likely to vary significantly.
For each user journey, I then assess the nature of the visitor requests - "visit homepage", for instance, might be "download 20 static files and 1 PHP script", while "register as new user" might require "1 PHP script", but with a fairly complex set of database scripts.
This process ends up as a set of rows in the spreadsheet showing the number of requests per type. For precision, it may be necessary to treat each dynamic page (PHP script) as it's own request, but I usually lump all the static assets together.
That gives you a baseline to test, based on a whole bunch of assumptions. You can now create load testing scripts representing "20 percent new users, 50 percent returning users, 10 percent homepage only, 20 percent complete purchase route, 20 percent abandon basket" or whatever user journeys you come up with.
Create a load testing script including the journeys and run it; ideally from multiple locations (there are several cheap ways to run Jmeter from cloud providers). Measure response times, and see where the response time of your slowest request exceeds your quality threshold (I usually recommend 3 seconds) in more than 10% of cases.
Try varying the split between user journeys - an advertising campaign might drive a lot of new registrations, for instance. I'd usually recommend at least 3 or 4 different mixtures.
If any of the variations in user journeys gives results that are significantly below the average (15% or more), that's probably your worst case scenario.
Otherwise, average the results, and you will know, with a reasonable degree of certainty, that this is the minimum number of requests you can support. The more variations in user journey you can test, the more certain it is that the number is accurate. By "minimum", I mean that you can be reasonably sure that you can manage at least this many users. It does not mean you can handle at most this many users - a subtle difference, but an important one!
In most web applications, the bottleneck is the dynamic page generation - there's relatively little point testing Apache's ability to serve static files, or your hosting provider's bandwidth. It's good as a "have we forgotten anything" test, but you'll get far more value out of testing your PHP scripts.
Before you even do this, I'd recommend playing "hunt the bottleneck" with just the PHP files - the process I've outlined above doesn't tell you where the bottleneck is, only that there is one. As it's most likely to be the PHP (and of course all the stuff you do from PHP, like calling a database), instrumenting the solution to test for performance is usually a good idea.
You should also use a tool like Yslow to make sure your HTTP/HTML set up is optimized - setting cache headers for your static assets will have a big impact on your bandwidth bill, and may help with performance as perceived by the end user. \
The short answer is no, probably not.
ab and friends, when run from the local machine, are not subject to network lag/bandwidth chokes.
Plus every real-life request requires different levels of processing - DB access/load, file includes etc etc.
Plus none of this takes into account the server load from other running background processes.
To get near real result i suggest you to analyze typical user behaviour, create a siege url's file with url users are visiting and run it with random delays. This results cant be directly transferable to production enivroment, but it's the nearest results you could get with your own. You can also try web services that test's web apps performance, but they are usually payed if you need complex test
But saying "it depends" doesn't help much, doesn't mean that the only valid answer isn't "it depends". Because it sort-of is.
Fact: Testing is not real-life usage.
Fact: Testing can come really close to real-life usage.
problem: how do you know if it does?
It depends on what you do with the requests.
Your single IP won't be a problem for many applications, so that would not be the first thing I'd worry about. But it could be: if you do complicated statistics once for every IP (save some information in a table you didn't design very well for instance), it means that you do this only once in test, so you'll have a bad time when the real users come along with their annoyingly different IP's
It depends on your test-system.
If all your requests come from a slow line (maybe it is slow because you are doing all these requests), you won't get a serious test. Basically, if you expect the incoming traffic to be more then your test-system's connection can handle.. you get the drift. The same will be true for CPU usage and the likes.
It depends on how good your tests are.
If your requests are for instance hitting all pages, but your users only hit one specific page, you will obviously get different results. The same would be true with frequency. If you hit the pages in an order that lets you get all advantage of things like cache (query cache is a tricky one in this, but also layers like memcached, varnish, etc), again, you will have a bad time. The simplest thing you can look for is the delay you can set on a siege test, but there are loads of other things you might want to take into account.
Writing good tests is hard, and the better your tests are, the closer you can get. But you need to know your system, know your users and know your tests. There really isn't much more to say then "it depends"
PHP - Apache with Codeigniter
JS - typical with jQuery and in house lib
The Problem: Determining (without forcing a download) a user's PC ability &/or virus issue
The Why: We put out a software that is mostly used in clinics, but can be used from home, however, we need to know, before they go to our mainsite, if their pc can handle the enormities of our web-based, browser-served software.
Progress: So far, we've come up with a decent way to test dl speed, but that's about it.
What we've done: In php we create about a 2.5Gb array of data to send to the user in a view, from there the view calculates the time it took to get the data and then subtracts the php benchmark from this time in order to get a point of reference of upload/download time. This is not enough.
Some of our (local) users have been found to have "crappy" pc's or are virus infected and this can lead to 2 problems. (1)They crash in the middle of preforming task in our program, or (2) their virus' could be trying to inject into our js thus creating a bad experience that may make us look bad to the average (uneducated on how this stuff works) user, thus hurting "our" integrity.
I've done some googling around, but most plug-ins or advice forums/blogs i've found simply give ways to benchmark the speed of your JS and that is simply not enough. I need a simple bit of code (no visual interface included, another problem i found with one nice piece of js lib that did this, but would take days to remove all of the authors personal visual code) that will allow me to test the following 3 things:
The user's data transfer rate (i think we have this covered, but if better method presented i won't rule it out)
The user's processing speed, how fast is the computer in general
possible test for infection via malware, adware, whatever maybe harmful to the user's experience
What we are not looking to do: repair their pc! We don't care if they have problems, we just don't want to lead them into our site if they have too many problems. If they can't do it from home, then they will be recommended to go to their nearest local office to use this software "in house" so to speak.
Further Explanation
We know your can't test the user-side stuff with PHP, we're not that stupid, PHP is mentioned because it can still be useful in either determining connection speed or in delivering a script that may do what we want. Also, this is not a software for just anyone on the net to go sign up and use, if you find it online, unless you are affiliated with a specific clinic and have a login name and what not, your not ment to use the sight, and if you get in otherwise, it's illegal. I can't really reveal a whole lot of information yet as the sight is not live yet. What I can say, is it mostly used by clinics/offices for customers to preform a certain task. If they don't have time/transport/or otherwise and need to do it from home, then the option is available. However, if their home PC is not "up to snuff" it will be nothing but a problem for them and make the 2 hours task they are meant to preform become a 4-6hour nightmare. Thus the reason, i'm at one of my fav quest sights asking if anyone may have had experience with this before and may know a good way to test the user's PC so they can have the best possible resolution, either do it from home (as their PC is suitable) or be told they need to go to their local office. Hopefully this clears things up enough we can refrain from the "sillier" answers. I need a REAL viable solution and/or suggestions, please.
PHP has (virtually) no access to information about the client's computer. Data transfer can just as easily be limited by network speed as computer speed. Though if you don't care which is the limiter, it might work.
JavaScript can reliably check how quickly a set of operations are run, and send them back to the server... but that's about it. It has no access to the file system, for security reasons.
EDIT: Okay, with that revision, I think I can offer a real suggestion - basically, compromise. You are not going to be able to gather enough information to absolutely guarantee one way or another that the user's computer and connection are adequate, but you can get a general idea.
As someone suggested, use a 10MB-20MB file and several smaller ones to test actual transfer rate; this will give you a reasonable estimate. Then, use JavaScript to test their system speed. But don't just stick with one test, because that can be heavily dependent on browser. Do the research on what tests will best give an accurate representation of capability across browsers; things like looping over arrays, manipulating (invisible) elements, and complex math. If there is a significant discrepancy between browsers, then use different thresholds; PHP does know what browser they're using, so you can give the system different "good enough" ratings depending on that. Limiting by version (like, completely rejecting IE6) may help in that.
Finally... inform the user. Gently. First let them know, "Hey, this is going to run a test to see if your network connection and computer are fast enough to use our system." And if it fails, tell them which part, and give them a warning. "Hey, this really isn't as fast as we recommend. You really ought to go down to the local clinic to perform this task; if you choose to proceed, it may take a lot longer than intended." Hopefully, at that point, the user will realize that any issues are on them, not on you.
What you've heard is correct, there's no way to effectively benchmark a machine based on Javascript - especially because the javascript engine mostly depends on the actual browser the user is using, amongst numerous other variables - no file system permissions etc. A computer is hardly going to let a browsers sub-process stress itself anyway, the browser would simply crash first. PHP is obviously out as it's server-side.
Sites like System Requirements Lab have the user download a java applet to run in it's own scope.
I am setting up a site using PHP and MySQL that is essentially just a web front-end to an existing database. Understandably my client is very keen to prevent anyone from being able to make a copy of the data in the database yet at the same time wants everything publicly available and even a "view all" link to display every record in the db.
Whilst I have put everything in place to prevent attacks such as SQL injection attacks, there is nothing to prevent anyone from viewing all the records as html and running some sort of script to parse this data back into another database. Even if I was to remove the "view all" link, someone could still, in theory, use an automated process to go through each record one by one and compile these into a new database, essentially pinching all the information.
Does anyone have any good tactics for preventing or even just detering this that they could share.
While there's nothing to stop a determined person from scraping publically available content, you can do a few basic things to mitigate the client's concerns:
Rate limit by user account, IP address, user agent, etc... - this means you restrict the amount of data a particular user group can download in a certain period of time. If you detect a large amount of data being transferred, you shut down the account or IP address.
Require JavaScript - to ensure the client has some resemblance of an interactive browser, rather than a barebones spider...
RIA - make your data available through a Rich Internet Application interface. JavaScript-based grids include ExtJs, YUI, Dojo, etc. Richer environments include Flash and Silverlight as 1kevgriff mentions.
Encode data as images. This is pretty intrusive to regular users, but you could encode some of your data tables or values as images instead of text, which would defeat most text parsers, but isn't foolproof of course.
robots.txt - to deny obvious web spiders, known robot user agents.
User-agent: *
Disallow: /
Use robot metatags. This would stop conforming spiders. This will prevent Google from indexing you for instance:
<meta name="robots" content="noindex,follow,noarchive">
There are different levels of deterrence and the first option is probably the least intrusive.
If the data is published, it's visible and accessible to everyone on the Internet. This includes the people you want to see it and the people you don't.
You can't have it both ways. You can make it so that data can only be visible with an account, and people will make accounts to slurp the data. You can make it so that the data can only be visible from approved IP addresses, and people will go through the steps to acquire approval before slurping it.
Yes, you can make it hard to get, but if you want it to be convenient for typical users you need to make it convenient for malicious ones as well.
There are few ways you can do it, although none are ideal.
Present the data as an image instead of HTML. This requires extra processing on the server side, but wouldn't be hard with the graphics libs in PHP. Alternatively, you could do this just for requests over a certain size (i.e. all).
Load a page shell, then retrieve the data through an AJAX call and insert it into the DOM. Use sessions to set a hash that must be passed back with the AJAX call as verification. The hash would only be valid for a certain length of time (i.e. 10 seconds). This is really just adding an extra step someone would have to jump through to get the data, but would prevent simple page scraping.
Try using Flash or Silverlight for your frontend.
While this can't stop someone if they're really determined, it would be more difficult. If you're loading your data through services, you can always use a secure connection to prevent middleman scraping.
force a reCAPTCHA every 10 page loads for each unique IP
There is really nothing you can do. You can try to look for an automated process going through your site, but they will win in the end.
Rule of thumb: If you want to keep something to yourself, keep it off the Internet.
Take your hands away from the keyboard and ask your client the reason why he wants the data to be visible but not be able to be scraped?
He's asking for two incongruent things and maybe having a discussion as to his reasoning will yield some fruit.
It may be that he really doesn't want it publicly accessible and you need to add authentication / authorization. Or he may decide that there is value in actually opening up an API. But you won't know until you ask.
I don't know why you'd deter this. The customer's offering the data.
Presumably they create value in some unique way that's not trivially reflected in the data.
Anyway.
You can check the browser, screen resolution and IP address to see if it's likely some kind of automated scraper.
Most things like cURL and wget -- unless carefully configured -- are pretty obviously not browsers.
Using something like Adobe Flex - a Flash application front end - would fix this.
Other than that, if you want it to be easy for users to access, it's easy for users to copy.
There's no easy solution for this. If the data is available publicly, then it can be scraped. The only thing you can do is make life more difficult for the scraper by making each entry slightly unique by adding/changing the HTML without affecting the layout. This would possibly make it more difficult for someone to harvest the data using regular expressions but it's still not a real solution and I would say that anyone determined enough would find a way to deal with it.
I would suggest telling your client that this is an unachievable task and getting on with the important parts of your work.
What about creating something akin to the bulletin board's troll protection... If a scrape is detected (perhaps a certain amount of accesses per minute from one IP, or a directed crawl that looks like a sitemap crawl), you can then start to present garbage data, like changing a couple of digits of the phone number or adding silly names to name fields.
Turn this off for google IPs!
Normally to screen-scrape a decent amount one has to make hundreds, thousands (and more) requests to your server. I suggest you read this related Stack Overflow question:
How do you stop scripters from slamming your website hundreds of times a second?
Use the fact that scrapers tend to load many pages in quick succession to detect scraping behaviours. Display a CAPTCHA for every n page loads over x seconds, and/or include an exponentially growing delay for each page load that becomes quite long when say tens of pages are being loaded each minute.
This way normal users will probably never see your CAPTCHA but scrapers will quickly hit the limit that forces them to solve CAPTCHAs.
My suggestion would be that this is illegal anyways so at least you have legal recourse if someone does scrape the website. So maybe the best thing to do would just to include a link to the original site and let people scrape away. The more they scrape the more of your links will appear around the Internet building up your pagerank more and more.
People who scrape usually aren't opposed to including a link to the original site since it builds a sort of rapport with the original author.
So my advice is to ask your boss whether this could actually be the best thing possible for the website's health.
I have little knowledge of Flash but for a little Flash game I have to store score and successful tries of users in a database using PHP. Now the Flash runs locally on the users computer and connects to a remote server. How can I secure against manipulation of game scores. Is there any best practice for this use case?
You might want to check these other questions:
Q46415 Passing untampered data from Flash app to server?
Q73947 What is the best way to stop people hacking the PHP-based highscore table of a Flash game.
Q25999 Secure Online Highscore Lists for Non-Web Games
What you are asking is inherently impossible. The game runs on the client and is therefore completely at the user's mercy. Only way to be sure is running a real time simulation of the game on the server based on user's input (mouse movement, keypresses), which is absolutely ridiculous.
This topic has been covered here # stackoverflow, at least in part
What is the best way to stop people hacking the PHP-based highscore table of a Flash game
As ssddw pointed out, this is fundamentally impossible. The code to send the score is running on the user's computer, and they have control over it and everything that runs there.
The best you can do is to periodically alter the encryption mechanism so that it takes score-manipulators a while to figure it out again. You can only minimize the damage, never eliminate it, but on a site like the one I work for, if we've got only a hundred people sending fake scores, out of the hundreds of thousands we see every day, we consider that well within the realm of acceptable. (We still crush those we catch cheating, but we don't consider it much of a problem.)
You could at least throw out scores that are above some threshold that you would deem legitimate. It still leaves room for more subtle maniputaion of a high scores list, but will at least help relieve the obvious frustration of seeing an impossible to achieve score topping the charts.