preventing multiple simultaneous queries with php/mysql live search - php

I have a working live search system that on the whole works very well. However it often runs into the problem that many versions of the search query on the server are running simultaneously, if users are typing faster than the results can be returned.
I am aborting the ajax request on receoipt of a new one, but that of course does not affect the query already in process on the server, and you end up with a severe bottleneck and a long wait to get your final results. I am using MySQL with MyISAM tables for this, and there does not seem to be any advantage in converting to InnoDB as the result sets will be the sane rows.
I tried using a session variable to make php wait if this session already has a query in progress but that seems to stop it working altogether.
The problem is solved if I make the ajax requests syncrhonous, but that would rather defeat the object here.
I was wondering if anyone had any suggestions as to how to make this work properly.
Best regards
John

Before doing anything more complicated, have you considered not sending the request until the user has stopped typing for at least a certain time interval (say, 1 second)? That should dramatically cut the number of requests being made with little effort on your part.

Related

How to store/access data in mariadb which is updated every 15 seconds

I'd like to download some json-data (which gets updated every 15 seconds) and store it in my maria-db with a PHP-Script.
Unfortunately, the database-update queries take between 1 second and sometimes up to 60 seconds, depending on the json-data-size.
So sometimes I'm dead-locking myself with the write-queries who take longer than 15 seconds and as soon as I read/process the data I'm blocking all the write-queries as well.
Obviously, I do have the wrong approach and it's more complicated than I thought.
Does anyone have a good idea how such a job can be done professionally, with a continuous update possibility and not blocking the updates itself when I read the data?
Thanks for any hints!
PS: Currently I'm using an InnoDB-Table, and to speed up the inserts I've set the auto-commit to 0 and update everything in a transaction.
I had the fastest results with LOCK TABLES for WRITE, but of course this blocks the read access as well.
Simply updating some data into MariaDB shouldn't take that long unless the update you're doing is complex. What you could consider is inserting the raw JSON (maybe even in a document database instead) and have a background process triggered by a cronjob to read from the stored raw JSON to update MariaDB.
Additionally you could consider inserting data rather than updating. This will prevent deadlocks from happening. Doing so might require you to change your data model, so it might not be the solution you're looking for.
Other than the above I'd recommend you look into the process you've setup and split it into multiple steps which can be run individually. Doing so allows you fine-grained control over the timing and triggers for each step, which will prevent deadlocks if setup properly.

Limiting mysql use per process

I have Debian VPS configured with a standard LAMP.
On this server, there is only one site (shop) which has a few cron jobs - mostly PHP scripts. One of them is update script executed by Lynx browser, which sends tons of queries.
When this script runs (it takes 3-4 minutes to complete) it consumes all MySQL resources, and the site almost doesn't work (page generates in 30-60 seconds instead of 1-2s).
How can I limit this script (i.e. extending its execution time limiting available resources) to allow other services to run properly? I believe there is a simple solution to the problem but can't find it. Seems my Google superpowers are limited last two days.
You don't have access to modify the offending script, so fixing this requires database administrator work, not programming work. Your task is called tuning the MySQL databse.
(I guess you already asked your vendor for help with this, and they said no.)
Ron top or htop while the script runs. Is CPU pinned at 100%? Is RAM exhausted?
1) Just live with it, and run the update script at a time of day when your web site doesn't have many visitors. Fairly easy, but not a real solution.
2) As an experiment, add RAM to your VPS instance. It may let MySQL do things all-in-RAM that it's presently putting on the hard drive in temporary tables. If it helps, that may be a way to solve your problem with a small amount of work, and a larger server rental fee.
3) Add some indexes to speed up the queries in your script, so each query gets done faster. The question is, what indexes will help? (Just adding indexes randomly generally doesn't help much.)
First, figure out which queries are slow. Give the command SHOW FULL PROCESSLIST repeatedluy while your script runs. The Info column in that result shows all the running queries. Copy them into a text file to keep them. (Or you can use MySQL's slow query log, about which you can read online.)
Second, analyze the worst offending queries to see whether there's an obvious index to add. Telling you how to do that generally is beyond the scope of a Stack Overflow answer. You might ask another question about a specific query. Before you do, please
reead this note about asking good SQL questions, and pay attention to the section on query performance.
3) It's possible your script is SELECTing many rows, or using SELECT to summarize many rows, from tables that also need to be updated when users visit your web site. In that case your visitors may be waiting for those SELECTs to finish. If you could change the script, you could put this statement right before long-running SELECTS.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
This allows the SELECT statement after it to do a "dirty read", in which it might get an earlier version of an updated row. See here.
Or, if you can figure out how to insert one statement into your obscured script, put this one right after it opens a database session.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Without access to the source code, though, you only have one way to see if this is the problem. That is, access the MySQL server from a privileged account, right before your script runs, and give these SQL commands.
SHOW VARIABLES LIKE 'tx_isolation';
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITED;
then see if the performance problem is improved. Set it back after your script finishes, probably like this (depending on the tx_isolation value retrieved above)
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITED;
Warning a permanent global change to the isolation level might foul up your application if it relies on transaction consistency. This is just an experiment.
4) Harass the script's author to fix this problem.
Slow queries? High CPU? High I/O? Then you must look at the queries. You cannot "tune your way out of a performance problem". Tuning might give you a few percent improvement; fixing the indexes and queries is likely to give you a lot more improvement.
See this for finding the 'worst' queries; then come back with SELECTs, EXPLAINs, and SHOW CREATE TABLEs for help.

CakePHP and connection pooling

I am working on a website that needs to serve multiple requests from the same table simultaneously. We made a simple index page in CakePHP which draws some data from the database (10 rows, to be precise), and a colleague executed a test simulating 1000 users viewing the same page at the same time, meaning that 1000 identical requests would be issued to the database. The thing is that at around 500 requests, the database stopped being responsive, everything just froze and we had to kill the processes.
What comes to mind is that each and every request is executed on its own connection, and this would explain why the MySQL server was overwhelmed. From a few searches online, and on SO, I can see that PHP does not support connection pooling natively, as can be done in a Java application, for instance. Having based our app on CakePHP 2.5.3, however, I would like to think that there is some underlying mechanism that overcomes these limitations. Perhaps I am not doing something right?
Any suggestion is welcome, I just want to make sure to exhaust every possible solution.
If results gonna be same for each query, you can cache the query result, then it will not send multiple request to database,
try this plugin:-
https://github.com/ndejong/CakephpAutocachePlugin

How to deal with External API latency

I have an application that is fetching several e-commerce websites using Curl, looking for the best price.
This process returns a table comparing the prices of all searched websites.
But now we have a problem, the number of stores are starting to increase, and the loading time actually is unacceptable at the user experience side. (actually 10s pageload)
So, we decided to create a database, and start to inject all Curl filtered result inside this database, in order to reduce the DNS calls, and increase Pageload.
I want to know, despite of all our efforts, is still an advantage implement a Memcache module?
I mean, will it help even more or it is just a waste of time?
The Memcache idea was inspired by this topic, of a guy that had a similar problem: Memcache to deal with high latency web services APIs - good idea?
Memcache could be helpful, but (in my opinion) it's kind of a weird way to approach the issue. If it was me, I'd go about it this way:
Firstly, I would indeed cache everything I could in my database. When the user searches, or whatever interaction triggers this, I'd show them a "searching" page with whatever results the server currently has, and a progress bar that fills up as the asynchronous searches complete.
I'd use AJAX to add additional results as they become available. I'm imagining that the search takes about ten seconds - it might take longer, and that's fine. As long as you've got a progress bar, your users will appreciate and understand that Stuff Is Going On.
Obviously, the more searches go through your system, the more up-to-date data you'll have in your database. I'd use cached results that are under a half-hour old, and I'd also record search terms and make sure I kept the top 100 (or so) searches cached at all times.
Know your customers and have what they want available. This doesn't have much to do with any specific technology, but it is all about your ability to predict what they want (or write software that predicts for you!)
Oh, and there's absolutely no reason why PHP can't handle the job. Tying together a bunch of unrelated interfaces is one of the things PHP is best at.
Your result is found outside the bounds of only PHP. Do not bother hacking together a result in PHP when a cronjob could easily be used to populate your database and your PHP script can simply query your database.
If you plan to only stick with PHP then I suggest you change your script to index your database from the results you have populated it with. To populate the results, have a cronjob ping a PHP script that is not accessible to the users which will perform all of your curl functionality.

How to make a javascript/php chatroom more efficient in terms of load time and sql communication

Right now the setup for my javascript chat works so it's like
function getNewMessage()
{
//code would go here to get new messages
getNewMessages();
}
getNewMessages();
And within the function I would use JQuery to make a get post to retrieve the messages from a php scrip which would
1. Start SQL connection
2. Validate that it's a legit user through SQL
3. retrieve only new message since the last user visit
4. close SQL
This works great and the chat works perfectly. My concern is that this is opening and closing a LOT of SQL connections. It's quite fast, but I'd like to make a small javascript multiplayer game now, and transferring user coordinates as well as the tens of other variables 3 times a second in which I'm opening and closing the sql connection each time and pulling information from numerous tables each time might not be efficient enough to run smoothly, and might be too much strain on the server too.
Is there any better more efficient way of communicating all these variables that I should know about which isn't so hard on my server/database?
Don't use persistent connections unless it's the only solution available to you!
When MySQL detects the connection has been dropped, any temporary tables are dropped, any active transaction is rolled back, and any locked tables are unlocked. Persistent connections only drop when the Apache child exits, not when your script ends, even if the script crashes! You could inherit a connection in the middle of a transaction. Worse, other requests could block, waiting for those tables to unlock, which may take quite a long time.
Unless you have measured how long it takes to connect and identified it as a very large percentage of your script's run time, you should not consider using persistent connections. In fact, that should be what you do here, if you're worried about performance. Check out xhprof or xdebug, profile your code, then start optimizing.
Maybe try to use a different approach to get the new messages from the server: Comet.
Using this technique you do not have to open that much new connections.
http://www.php.net/manual/en/features.persistent-connections.php
and
http://www.php.net/manual/en/function.mysql-pconnect.php
A couple of dozen players at the same time won't hurt the database or cause noticeable lag if you have efficient SQL statements. Likely your database will be hosted on the same server or at least the same network as your game or site, so no worries. If your DB happens to be hosted on a separate server running an 8-bit 16mz board loaded with MSDOS, located in the remote Amazon, connected by radio waves hooked up to a crank-powered generator operatated by a drunk monkey, you're on your own with this one.
Otherwise, really you should be more worried about exactly how much data you're passing back and forth to your players. If you're passing back and forth coordinates for all objects in an entire world, page load could take a painfully long time, even though the DB query takes a fraction of a second. This is sometimes overcome in games by a "fog of war" feature which doesn't bother notifying the user of every single object in the entire map, only those which are in immediate range of the player. This can easily be done with a single SQL query where object coordinates are in proximity to a player. Though, if you have a stingy host, they will care about the number of connects and queries.
If you're concerned about attracting even more players than that, consider exploring cache methods like pre-building short files storing commonly fetched records or values using fopen(), fgets(), fclose(), etc. Or, use php extensions like apc to store values in memory which persist from page load to page load. memcache or memcached also act similarly, but in a way which acts like a separate server you can connect to, store values which can be shared with other page hits, and query.
To update cached pages or values when you think they might become stale, you can run a cron job every so often to update these files or values. If your host doesn't allow cron jobs, consider making your guests do that legwork: a line of script on a certain page will refresh the cache with new values from a database query after a certain number of page hits. Or cache a date value to check against on every page hit, and if so much time has passed, refresh the cache.
Again, unless you're under the oppressive thumb of a stingy host, or unless you're getting a hundred or more page hits at a time, no need to even be concerned about your database. Databases are not that fragile. If they crashed in a hysterical fit of tears anytime more than one query came their way, the engineers who made it wouldn't have a job for very long.
I know this is quite an annoying "answer" but perhaps you should be thinking about this a different way, after all this is really not the strongest use of a relational database. Have you considered an XMPP solution? IMO this would be the best tool for the job and both ejabberd and openfire are trivial to set up these days. The excellent Strophe library can make the front end story easy, and as an added bonus you get HTTP binding (like commet) so you won't need to poll the server, your latency will go down and you'll be generating less HTTP traffic.
I know it's highly unlikely you're going to change your whole approach just cos I said so, but wanted to provide an alternative perspective.
http://www.ejabberd.im/
http://code.stanziq.com/strophe/

Categories