MySQL Query Not Inserting All Records (using PHP) - php

I have a fairly large amount of data that I'm trying to insert into MySQL. It's a data dump from a provider that is about 47,500 records. Right now I'm simply testing the insert method through a PHP script just to get things dialed in.
What I'm seeing is that, first, the inserts will continue to happen long after the PHP script "finishes". So by the time I can see the browser no longer has an "X" to cancel the request and now has a "reload" (indicating the script is done from the browser perspective) I can see for a good 10+ minutes that inserts are still occurring. I assume this is MySQL caching the queries. Is there any way to keep the script "alive" until all queries have completed? I put a 15 minute timeout on my script.
Second, and more disturbing, is that I won't get every insert. Of the 47,500 records I'll get anywhere between 28,000 and 38,000 records but never more - and that number is random each time I run the script. Anything I can do about that?
Lastly, I have a couple simple echo statements at the end of my script for debugging, these never fire - leading me to believe that a time out might be happening (although I don't get any errors about time-outs or memory outages). I'm thinking this has something to do with the problem but am not sure.
I tried changing my table to an archive table but not only didn't that help but it also means I lose the ability to update the records in the table when I want to, I did it only as a test.
Right now the insert is in a simple loop, it loops each record in the JSON data that I get from the source and runs an insert statement, then on to the next iteration. Should I be trying to instead using the loop to build a massive insert and run a single insert statement at the end? My concern with this is that I fear I would go beyond my max_allowed_packet configuration that is hard coded by my hosting provider.
So I guess the real question is what is the best method to insert nearly 50,000 records into MySQL using PHP based on what I've explained here.

Related

How to store/access data in mariadb which is updated every 15 seconds

I'd like to download some json-data (which gets updated every 15 seconds) and store it in my maria-db with a PHP-Script.
Unfortunately, the database-update queries take between 1 second and sometimes up to 60 seconds, depending on the json-data-size.
So sometimes I'm dead-locking myself with the write-queries who take longer than 15 seconds and as soon as I read/process the data I'm blocking all the write-queries as well.
Obviously, I do have the wrong approach and it's more complicated than I thought.
Does anyone have a good idea how such a job can be done professionally, with a continuous update possibility and not blocking the updates itself when I read the data?
Thanks for any hints!
PS: Currently I'm using an InnoDB-Table, and to speed up the inserts I've set the auto-commit to 0 and update everything in a transaction.
I had the fastest results with LOCK TABLES for WRITE, but of course this blocks the read access as well.
Simply updating some data into MariaDB shouldn't take that long unless the update you're doing is complex. What you could consider is inserting the raw JSON (maybe even in a document database instead) and have a background process triggered by a cronjob to read from the stored raw JSON to update MariaDB.
Additionally you could consider inserting data rather than updating. This will prevent deadlocks from happening. Doing so might require you to change your data model, so it might not be the solution you're looking for.
Other than the above I'd recommend you look into the process you've setup and split it into multiple steps which can be run individually. Doing so allows you fine-grained control over the timing and triggers for each step, which will prevent deadlocks if setup properly.

Persist mongodb cursor between page requests in php

I have a very large dataset that i am exporting using a batch process to keep the page from timing out. The whole process can take over an hour, and i'm using drupal batch which basically reloads the page with a status on how far the process has completed. Each page request essentially runs the query again which includes a sort which takes a while. Then it exports the data to a temp file. The next page load runs the full mongo query, sorts, skips the entries already exported, and exports more to the temp file. The problem is that each page load makes mongo rerun the entire query and sort. I'd like to be able to have the next batch page just pick up the same cursor where it left off and continue to pull the next set of results.
The MongoDB Manual entry for cursor.skip() gives some advice:
Consider using range-based pagination for these kinds of tasks. That is, query for a range of objects, using logic within the application to determine the pagination rather than the database itself. This approach features better index utilization, if you do not need to easily jump to a specific page.
E.g If your nightly batch process runs over the data accumulated in the last 24hrs, perhaps you can run date-range based queries (maybe one per hour of the day) and process your data that way. I'm assuming that your data contains some sort of usable time stamp per document, but you get the idea.
Although cursors live on the server and only timeout after roughly 10minutes of no-activity, the PHP driver does not support persisting cursors between requests.
At the end of each request the driver will kill all cursors created during that request that have not been exhausted.
This also happens when all references to the MongoCursor object are removed (eg $cursor = null).
This is done as its unfortunately fairly common for applications not to iterate over the entire cursor, and we don't want to leave unused cursors around on the server as it could cause performance implications.
For your specific case, the best way to work around this problem is to improve your indexes so loading the cursor is faster.
You may also want to only select some subset of the data so you have a fixed point you can request data between.
Say, for reports, your first request may ask for all data from 1am to 2am.
Then your next request asks for all data from 2am to 3am and so on and on, like Saftschleck explains.
You may also want to look into the aggregation framework, which is designed to do "online reporting": http://docs.mongodb.org/manual/aggregation/

mysql query timeout - "Default" outputted to browser

I have an application written in PHP which contains a function to perform a complex MySQL query to gather statistics and export it as CSV. Usually the process takes a good 20-30 seconds to complete due to the complexity of the query but I can live with this as it's just one query once a week.
The issue I have is now and again the server just appears to timeout with the word 'Default' outputted to the browser and nothing else
I'm sure this isn't being set/printed in the application logic because I wrote it myself and after looking at the database class I searched every single file in the application for the word Default with no results
I'm also pretty sure it can't be output from the MySQL server because it can't directlty print output without going through PHP can it?
What could be causing this? I'm thinking the only function that could be printing it is my mysql_query() function. Obviously my aim is to optimize the query to stop the timeout but I'd like to find out what is ouputting text as I don't like errors/messages like that being displayed to our users

import script with many queries causes a slow website

We build a link for our offline program to our website. In our offline program we have 50.000 records we want to push to our website. What we do now is the following:
In the offline program we build an xml file with 1500 records and post it to a php file on our webserver. On the webserver we read the xml and push it to the mysql database, before we do that we first check if the record already exist and then we update the record or insert it as a new record.
When thats done, we give back a message to our offline program that the batch is completed . The offline program builds a new xml file with the next 1500 records. This process repeats till it reached the last 1500 records.
The problems is that the webserver become very slow while pushing the records to the database. Probably thats because we first check the records that already exist (that's one query) and then write it into the database (that's the second query). So for each batch we have to run 3000 queries.
I hope you guys have some tips to speed up this process.
Thanks in advance!
Before starting the import, read all the data ids you have, do not make checking queries on every item insert, but check it in existed php array.
Fix keys on your database tables.
Make all inserts on one request, or use Transactions.
there is no problems to import a lot of data such way, i had a lot of experience with it.
A good thing to do is write a single query composed of the concatenation of all of the insert statements separated by a semicolon:
INSERT INTO table_name
(a,b,c)
VALUES
(1,2,3)
ON DUPLICATE KEY
UPDATE a = 1, b = 2, c = 3;
INSERT INTO table_name
...
You could do concatenate 100-500 insert statements and wrap them in a transaction.
Wrapping many statements into a transaction can help by the fact that it doesn't immediately commit the data to disk after each row inserted, it keeps the whole 100-500 batch in memory and when they are all finished it writes them all to disk - which means less intermittent disk-IO.
You need to find a good batch size, I exemplified 100-500 but depending on your server configurations, on the amount of data per statement and on the number of inserts vs. updates you'll have to fine tune it.
Read some information about Mysql Unique Index Constraints. This should help:
Mysql Index Tutorial
I had the same problem 4 months ago and I got more performance coding in java rather than php and avoiding xml documents.
My tip: you can read the whole table (if you do it once is faster than make many queries 1 by 1) and keep this table in memory (in a HashMap for example). And before inserting a record, you can check if it exists in your structure localy (you do not bother the DB).
You can improve your performance this way.

preventing multiple simultaneous queries with php/mysql live search

I have a working live search system that on the whole works very well. However it often runs into the problem that many versions of the search query on the server are running simultaneously, if users are typing faster than the results can be returned.
I am aborting the ajax request on receoipt of a new one, but that of course does not affect the query already in process on the server, and you end up with a severe bottleneck and a long wait to get your final results. I am using MySQL with MyISAM tables for this, and there does not seem to be any advantage in converting to InnoDB as the result sets will be the sane rows.
I tried using a session variable to make php wait if this session already has a query in progress but that seems to stop it working altogether.
The problem is solved if I make the ajax requests syncrhonous, but that would rather defeat the object here.
I was wondering if anyone had any suggestions as to how to make this work properly.
Best regards
John
Before doing anything more complicated, have you considered not sending the request until the user has stopped typing for at least a certain time interval (say, 1 second)? That should dramatically cut the number of requests being made with little effort on your part.

Categories