Im using functions for logging in a user, When they login but fail either by no captcha sent, failed captcha or failed login it will give there IP a Try. When they reach 5 tries they get blocked from the login page for approximately 1 hour. I have a function that updates the MySQL Column to increment there try count and last try date. But from looking at PHP's documents it states:
Note: The increment/decrement operators only affect numbers and
strings. Arrays, objects and resources are not affected.
My function gets the try count from the Database and then tries updating it. My SQL result for fetching the Try count is by default an Array because of how PDO works. So how can I efficiently increment an array?
I was thinking of doing a foreach condition and use the .=opperator to save it to a string and from there increment. But is that really the most efficient way?
Thank you.
P.S: I'm not showing any example code e.t.c because this question is simple enough. I have searched around on here and couldn't find a proper answer.
To understand why your question is wrong, you have to understand what an array is.
An array is just a "bag" that holds other variables. so, your question sounds like "How can I pay for a two beers with my pocket?". The thing is, you can't pay with a pocket. you have to take the cash out of the pocket and then use that cash.
Exactly the same thing goes with arrays: you have to extract the returned data from array, and then you are free to perform any operation on its contents. On the contents, remember, not on the bag.
But for the efficient solution, go for the other answer, which solved your initial problem the right way - without the need of selecting any arrays at all.
And just a side note
MySQL result for fetching the Try count is by default an Array because of how PDO works.
As a matter of fact, PDO can work in many different ways. For example, it can return scalar values all right.
You can increment it in an update query directly. When you want to add a try, simply:
UPDATE `tries` SET `tries` = `tries` + 1 WHERE `ip` = '127.0.0.1';
Just replace the IP with the actual IP.
Just to add..
IMO you should be using a separate table for incorrect login attempts. There are many reasons for this, but one of the important is that any attack is likely to rotate usernames and not only passwords in the attempt.
Having a separate table that records all incorrect logins allows you to much more easily query for an amount of incorrect logins in xx time. Incorrect logins attached to a user limits your ability to detect DoS and brute force attacks coming from scripted sources as you can only look at the username attempted if it actually existed in the first place.
However, you can relate a field in the table to the users ID, so that you can track users independently, then on successful login, the records that relate to that user could be deleted.
To give you a working example. I have built in the following functionality into the commercial Symfony project that I work on on a daily basis.
table example
userID --- foreign key (not mandatory)
IP --- mandatory
timestamp --- mandatory
we query the data like this:
Overall failed attempts for a particular subdomain (we have lots of them in use using the same system)
the system is used in schools, so we have to cater for naughty students!
Overall failed attempts in the last minute
system sleeps for a random time based on a base value x the amount. (a bit of a hacky way to try to trip up script attacks)
Overall attempts for a particular user
similar to your example.. compares to preconfigured amounts then warns/disables users accordingly. If it blocks sends an email to the helpdesk team.
this is by no means a suggested list, or an example of what should be done.. its merely what we decided on our applications circumstances.
The point is, without a separate table much of this wouldn't be possible.
Related
I have a webshop, user will buy something
When the user visit the my web, their data will be saved temporarly in a variable $user as an array
It will be checked when the user buy something, if their balance is more or equal to the things that they want to buy, but the problem arise when there are some users that try to use two browsers to buy things at the same times, their balance just cut once (it should be twice, since they buy it twice using two browsers)
I know I can just update the $user variable before checking, but I will have to run another query to MySQL, and there is many orders ...
Is there any SQL syntax that can be used to prevent this kind of attack?
for checking their balance and make sure it's correct
Based on your current setup (ie using a variable):
Someone using two browser on the same site trying to use up their balance is going to be fairly rare, with most cases someone trying to game your system.
Just finally check their balance at the point of processing the order and if ok allow it, otherwise don't. For those doing it by accident (which is rare), they'll soon realise the error.
An alternative:
It'd be better to check the real data rather than a variable which isn't reliable and has to be forced to be persistent, and as you know not available in different sessions.
I think a better way would be to use some fast centralised persistent storage like Redis (fairly easy to learn, essentially it's an array stored in memory). You can then store their username (or whatever uniquely IDs them) and while they may have different keys across the two browsers, there will be a common unique ID and you can update their credit value in both (all) sessions by searching for the unique ID.
Then whatever browser that user is logged in to will be updated same as other browsers.
Maybe a better idea:
Unless your application needs it, don't let people log in to different browsers/devices. When they try to login, state "already logged in on another place, want to log that one our and log in here?" etc
So I have nearly finished my notification system, and just before I am about to implement reCAPTCHA, I test what happens if I spam the notifications.
To give you some background on my notification system. I determine the newest content, by its timestamp. I retrieve the rows from the database ORDER BY timestamp. The timestamp value is an integer formatted to Unix Time. When notifications are shown, they are hyperlinks, that follow this URL format -
http://test.com/article/id
Where id is the id for the table, each time a new article is submitted, the id increments. I noticed after spamming my notifications, that the URL's of the spammed notifications are in reverse order. After further investigation, I find that if I spam quick enough, the timestamp variable is not accurate enough, and records multiple submissions with the same timestamp.
Since my website is low traffic now, and there aren't many submissions, this is currently not an issue, but if, a very small chance, but if a piece of content is submitted at the same time as another, the notifications will rank when they were submitted wrongly, a small, but annoying bug.
So I'm wondering what I should do. Should I fix the issue, or is this an extremely minute chance of this happening. Due to the implementation of reCAPTCHA, spamming is not an issue, but there is still a chance this could happen by accident.
I have come up with 3 possible solutions. My question is which would be the most efficient
Create a global id for all 4 types of content, which increments every time a comment, article or update is created.
Use a more accurate PHP time function, such as microtime
Add some sort of secondary ranking variable
Given that there can be multiple threads and even multiple nodes in a cluster inserting data in different tables there is always a possibility that any clock based value you use will get duplicated in multiple, or even the same, table.
So my first thought is to use a global id table. You could use a common content table with an auto-incrementing primary key that all other tables foreign key into and use that for ordering.
On the other hand, by the same logic, how well can you ensure some fixed order between submissions? It is quite possible that two submissions will be committed to the database in the reverse order of receiving at the server. The only way to solve that problem I think is to a have a global gatekeeper that all requests have to pass through. If you are using such a gatekeeper, that is also the best place to assign the ordering value.
All in all, I think you should not insist on complete ordering because it does not exist unless it is a highly ordering sensitive system such as Trading or Betting. Otherwise microsecond should be good enough, as long as the notification for a comment on a article does not come before the article itself.
I am creating an application to help our team manage a twitter competition. So far I've managed to interact with the API fine, and return a set of tweets that I need.
I'm struggling to decide on the best way to handle the storage of the tweets in the database, how often to check for them and how to ensure there are no overlaps or gaps.
You can get a maximum number of 100 tweets per page. At the moment, my current idea is to run a cron script say, once every 5 minutes or so and grab a full 100 tweets at a time, and loop through them looking in the db to see if I can find them, before adding them.
This has the obvious drawback of running 100 queries against the db every 5 minutes, and however many INSERT there are also. Which I really don't like. Plus I would much rather have something a little more real time. As twitter is a live service, it stands to reason that we should update our list of entrants as soon as they enter.
This again throws up a drawback of having to repeatedly poll Twitter, which, although might be necessary, I'm not sure I want to hammer their API like that.
Does anyone have any ideas on an elegant solution? I need to ensure that I capture all the tweets, and not leave anyone out, and keeping the db user unique. Although I have considered just adding everything and then grouping the resultant table by username, but it's not tidy.
I'm happy to deal with the display side of things separately as that's just a pull from mysql and display. But the backend design is giving me a headache as I can't see an efficient way to keep it ticking over without hammering either the api or the db.
100 queries in 5 minutes is nothing. Especially since a tweet has essentially only 3 pieces of data associated with it: user ID, timestamp, tweet, tweet ID - say, about 170 characters worth of data per tweet. Unless you're running your database on a 4.77MHz 8088, your database won't even blink at that kind of "load"
The Twitter API offers a streaming API that is probably what you want to do to ensure you capture everything:
http://dev.twitter.com/pages/streaming_api_methods
If I understand what you're looking for, you'll probably want a statuses/filter, using the track parameter with whatever distinguishing characteristics (hashtags, words, phrases, locations, users) you're looking for.
Many Twitter API libraries have this built in, but basically you keep an HTTP connection open and Twitter continuously sends you tweets as they happen. See the streaming API overview for details on this. If your library doesn't do it for you, you'll have to check for dropped connections and reconnect, check the error codes, etc - it's all in the overview. But adding them as they come in will allow you to completely eliminate duplicates in the first place (unless you only allow one entry per user - but that's client-side restrictions you'll deal with later).
As far as not hammering your DB, once you have Twitter just sending you stuff, you're in control on your end - you could easily have your client cache up the tweets as they come in, and then write them to the db at given time or count intervals - write whatever it has gathered every 5 minutes, or write once it has 100 tweets, or both (obviously these numbers are just placeholders). This is when you could check for existing usernames if you need to - writing a cached-up list would allow you the best chance to make things efficient however you want to.
Update:
My solution above is probably the best way to do it if you want to get live results (which it seems like you do). But as is mentioned in another answer, it may well be possible to just use the Search API to gather entries after the contest is over, and not worry about storing them at all - you can specify pages when you ask for results (as outlined in the Search API link), but there are limits as to how many results you can fetch overall, which may cause you to miss some entries. What solution works best for your application is up to you.
I read over your question and it seems to me that you want to duplicate data already stored by Twitter. Without more specifics on the competition your running, how users enter for example, estimated amount of entries; its impossible to know whether or not storing this information locally on a database is the best way to approach this problem.
Might a better solution to be, skip storing duplicate data locally and drag the entrants directly from twitter, i.e. when your attempting to find a winner.
You could eliminate duplicate entries on-the-fly then whilst the code is running. You would just need to call "the next page" once its finished processing the 100 entries its already fetched. Although, i'm not sure if this is possible directly through the Twitter API.
I think running a cron every X minutes and basing it off of the tweets creation date may work. You can query your database to find the last date/time of the last recorded tweet, then only run selects if there are matching times to prevent duplicates. Then, when you do your inserts into the database, use one or two insert statements containing all the entries you want to record to keep performance up.
INSERT INTO `tweets` (id, date, ...) VALUES (..., ..., ...), (..., ..., ...), ...;
This doesn't seem too intensive...also depends on the number of tweets you expect to record though. Also make sure to index the table properly.
Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have
(a) a lot of users, or
(b) it'll only happen when two people try doing something
at _exactly_ the same time"
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
I kind of agree with #Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
And right from the MySQL documentation on the subject:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of
using the function this way is that the ID value is maintained in the server as
the last automatically generated value. It is multi-user safe because multiple
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by
other clients that generate their own sequence values.
Instead of using SELECT MAX(id) you shoud do as the documentation says :
Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query
Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.
I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.
This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.
At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.
In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?
I'm designing a very simple (in terms of functionality) but difficult (in terms of scalability) system where users can message each other. Think of it as a very simple chatting service. A user can insert a message through a php page. The message is short and has a recipient name.
On another php page, the user can view all the messages that were sent to him all at once and then deletes them on the database. That's it. That's all the functionality needed for this system. How should I go about designing this (from a database/php point of view)?
So far I have the table like this:
field1 -> message (varchar)
field2 -> recipient (varchar)
Now for sql insert, I find that the time it takes is constant regardless of number of rows in the database. So my send.php will have a guaranteed return time which is good.
But for pulling down messages, my pull.php will take longer as the number of rows increase! I find the sql select (and delete) will take longer as the rows grow and this is true even after I have added an index for the recipient field.
Now, if it was simply the case that users will have to wait a longer time before their messages are pulled on the php then it would have been OK. But what I am worried is that when each pull.php service time takes really long, the php server will start to refuse connections to some request. Or worse the server might just die.
So the question is, how to design this such that it scales? Any tips/hints?
PS. Some estiamte on numbers:
number of users starts with 50,000 and goes up.
each user on average have around 10 messages stored before the other end might pull it down.
each user sends around 10-20 messages a day.
UPDATE from reading the answers so far:
I just want to clarify that by pulling down less messages from pull.php does not help. Even just pull one message will take a long time when the table is huge. This is because the table has all the messages so you have to do a select like this:
select message from DB where recipient = 'John'
even if you change it to this it doesn't help much
select top 1 message from DB where recipient = 'John'
So far from the answers it seems like the longer the table the slower the select will be O(n) or slightly better, no way around it. If that is the case, how should I handle this from the php side? I don't want the php page to fail on the http because the user will be confused and end up refreshing like mad which makes it even worse.
the database design for this is simple as you suggest. As far as it taking longer once the user has more messages, what you can do is just paginate the results. Show the first 10/50/100 or whatever makes sense and only pull those records. Generally speaking, your times shouldn't increase very much unless the volume of messages increases by an order of magnatude or more. You should be able to pull back 1000 short messages in way less than a second. Now it may take more time for the page to display at that point, but thats where the pagination should help.
I would suggest though going through and thinking of future features and building your database out a little more based on that. Adding more features to the software is easy, changing the database is comparatively harder.
Follow the rules of normalization. Try to reach 3rd normal form. To go further for this type of application probably isn’t worth it. Keep your tables thin.
Don’t actually delete rows just mark them as deleted with a bit flag. If you really need to remove them for some type of maintenance / cleanup to reduce size. Mark them as deleted and then create a cleanup process to archive or remove the records during low usage hours.
Integer values are easier for SQL server to deal with then character values. So instead of where recipient = 'John' use WHERE Recipient_ID = 23 You will gain this type of behavior when you normalize your database.
Don't use VARCHAR for your recipient. It's best to make a Recipient table with a primary key that is an integer (or bigint if you are expecting extremely large quantities of people).
Then when you do your select statements:
SELECT message FROM DB WHERE recipient = 52;
The speed retrieving rows will be much faster.
Plus, I believe MySQL indexes are B-Trees, which is O(log n) for most cases.
A database table without an index is called a heap, querying a heap results in each row of the table being evaluated even with a 'where' clause, the big-o notation for a heap is O(n) with n being the number of rows in the table. Adding an index (and this really depends on the underlying aspects of your database engine) results in a complexity of O(log(n)) to find the matching row in the table. This is because the index most certainly is implemented in a b-tree sort of way. Adding rows to the table, even with an index present is an O(1) operation.
> But for pulling down messages, my pull.php will take longer as the number of rows
increase! I find the sql select (and delete) will take longer as the rows grow and
this is true even after I have added an index for the recipient field.
UNLESS you are inserting into the middle of an index, at which point the database engine will need to shift rows down to accommodate. The same occurs when you delete from the index. Remember there is more than one kind of index. Be sure that the index you are using is not a clustered index as more data must be sifted through and moved with inserts and deletes.
FlySwat has given the best option available to you... do not use an RDBMS because your messages are not relational in a formal sense. You will get much better performance from a file system.
dbarker has also given correct answers. I do not know why he has been voted down 3 times, but I will vote him up at the risk that I may lose points. dbarker is referring to "Vertical Partitioning" and his suggestion is both acceptable and good. This isn't rocket surgery people.
My suggestion is to not implement this kind of functionality in your RDBMS, if you do remember that select, update, insert, delete all place locks on pages in your table. If you do go forward with putting this functionality into a database then run your selects with a nolock locking hint if it is available on your platform to increase concurrency. Additionally if you have so many concurrent users, partition your tables vertically as dbarker suggested and place these database files on separate drives (not just volumes but separate hardware) to increase I/O concurrency.
So the question is, how to design this such that it scales? Any tips/hints?
Yes, you don't want to use a relational database for message queuing. What you are trying to do is not what a relational database is best designed for, and while you can do it, its kinda like driving in a nail with a screwdriver.
Instead, look at one of the many open source message queues out there, the guys at SecondLife have a neat wiki where they reviewed a lot of them.
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
This is an unavoidable problem - more messages, more time to find the requested ones. The only thing you can do is what you already did - add an index and turn O(n) look up time for a complete table scan into O(log(u) + m) for a clustered index look up where n is the number of total messages, u the number of users, and m the number of messages per user.
Limit the number of rows that your pull.php will display at any one time.
The more data you transfer, longer it will take to display the page, regardless of how great your DB is.
You must limit your data in the SQL, return the most recent N rows.
EDIT
Put an index on Recipient and it will speed it up. You'll need another column to distinguish rows if you want to take the top 50 or something, possibly SendDate or an auto incrementing field. A Clustered index will slow down inserts, so use a regular index there.
You could always have only one row per user and just concatenate messages together into one long record. If you're keeping messages for a long period of time, that isn't the best way to go, but it reduces your problem to a single find and concatenate at storage time and a single find at retrieve time. It's hard to say without more detail - part of what makes DB design hard is meeting all the goals of the system in a well-compromised way. Without all the details, its hard to give advice on the best compromise.
EDIT: I thought I was fairly clear on this, but evidently not: You would not do this unless you were blanking a reader's queue when he reads it. This is why I prompted for clarification.