I have a curious issue that is happening on my system when I run a PHP program I created. I am taking data from two tables and doing a comparison of data from Table A with Table B. If the record is not in Table A, then write that record to Table B. I discovered that the program works fine but, at least one record has to be in Table B first. After my first successful run Table A will have 16000+ records in it, now Table B has 15000+ or so records. I understand that there is going to be a bit of time for this to process. The curious thing is I noticed my hard drive is losing free space as the program runs. I have tried manually running the garbage collection. I also looked where the session files are being stored, only to find a few files that are rather small in size. I also tried adjusting the length of time that session files are stored from 1440 seconds to 30 secs. When I say that I am "losing" free space, there is something that is filling up my hard drive. I have gone from having 6GB to 5.75GB, if I allow the program to run longer, I only lose more space. I have also tried just simply restarting my system and I only regain a small portion of the space I lost. At this point I am unsure what I need to do to stop this from happening. Here is a sample of my code below:
<?php
include('./connect_local_pdo.php'); //Includes DB Connection Script
ini_set('max_execution_time', 5400); //5400 seconds = 90 minutes
gc_disable();
try {
$tbl_a_data = $conn->prepare('SELECT col_a, col_b, col_c from table_a');
$tbl_a_data->execute();
$tbl_b_data = $conn->prepare('SELECT col_a, col_b, col_c from table_b');
$tbl_b_data->execute();
$tbl_b_array = $tbl_b_data->fetchAll(PDO::FETCH_ASSOC);
while($tbl_a_array = $tbl_a_data->fetch(PDO::FETCH_ASSOC)){
foreach ($tbl_b_array as $tbl_b_array2){
if ($tbl_a_array['col_a'] !== $tbl_b_array2['col_a']){
$stmt = $conn->prepare("INSERT INTO table_b
(col_a, col_b, col_c)
VALUES
(:col_a, :col_b, :col_c)");
$stmt->bindParam(':col_a', $tbl_a_array['col_a']);
$stmt->bindParam(':col_b', $tbl_a_array['col_b']);
$stmt->bindParam(':col_c', $tbl_a_array['col_c']);
$stmt->execute();
} else {
$stmt = $conn->prepare("update table_b
set
col_b = table_a.col_b,
col_c = table_a.col_c
from table_a
where table_b.col_a = table_a.col_a ");
$stmt -> execute();
}
}
}
gc_collect_cycles();
gc_mem_caches();
clearstatcache();
} catch (PDOException $a) {
echo $a->getMessage();//Remove or change message in production code
}
Any assistance with this will be greatly appreciated! As of this post I have lost 2 gigs of space running this program.
With your line
include('./connect_local_pdo.php'); //Includes DB Connection Script
I assume it is a Database on localhost.
The Database is growing with your entries. It will take space to add rows.
Related
I developed a webservice (PHP/MySQL) that simply output a coupon code through a JSON string.
How it works: the application receives 1 parameter (email), it then makes a request to the database table to get a coupon code that has not yet been assigned. Then a request is made to update the row of this coupon code and put "1" in the assigned column. (SELECT / UPDATE routine)
After that, the JSON is outputed like this:
echo '{"couponCode": "'. $coupon_code . '"}';
That's all.
The problem is that the webservice receives 10000 requests in approx 1 minute. This occurs only one time a day. If I look in the raw logs of apache I can see that it has received exactly 10000 requests each time but in my table there's only 984 rows that has been updated (i.e.: 984 coupon codes given). I tested it multiple time and it varies between 980 and 986 each time. The log file created by the webservice doesn't show any errors and reflects exactly what has been updated in the database, between 980 to 986 new lines each time.
My question is: what happened with the missing requests? Is it the server that has not enough memory to handle such multiple requests in this short period of time? (When I test with 5000 requests it work OK)
If it can help, here's the function that get the a new coupon codes:
function getNewCouponCode($email){
$stmt = $this->connector->prepare("SELECT * FROM coupon_code WHERE email = '' ORDER BY id ASC LIMIT 1");
$stmt2 = $this->connector->prepare("UPDATE coupon_code SET email = :email WHERE id = :id");
try{
$this->connector->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$this->connector->beginTransaction();
$this->connector->exec("LOCK TABLES coupon_code WRITE");
/*TRANSACTION 1*/
$stmt->execute();
$result["select"] = $stmt->fetch(PDO::FETCH_ASSOC);
/*TRANSACTION 1*/
/*TRANSACTION 2*/
$stmt2->bindParam(":email", $email);
$stmt2->bindParam(":id", $result["select"]["id"]);
$result["update"] = $stmt2->execute();
/*TRANSACTION 2*/
$this->connector->commit();
$this->connector->exec('UNLOCK TABLES');
return $result;
}catch(Exception $e) {
$this->connector->rollBack();
$this->connector->exec('UNLOCK TABLES');
$result["error"] = $e->getMessage();
return $result;
}
}
Thanks in advance.
986 requests per minute is a pretty significant load for a PHP application the way you've designed it, and an Apache web server. It sounds like you're running this all on a single server.
First off, whatever is slamming you 10k times per minute should know to re-try later on if it gets a failure. Why isn't that happening? If that remote system is under your control, see if you can fix that.
Next, you'll find that the threading model of Nginx is much more efficient than Apache's for what you're doing.
Now, on to your application... it doesn't look like you actually need a SELECT and then UPDATE. Why not just an update, and check the result? Then it's atomic on its own and you don't have to do this table locking stuff (which is really going to slow you down).
This is more of a logic question than language question, though the approach might vary depending on the language. In this instance I'm using Actionscript and PHP.
I have a flash graphic that is getting data stored in a mysql database served from a PHP script. This part is working fine. It cycles through database entries every time it is fired.
The graphic is not on a website, but is being used at 5 locations, set to load and run at regular intervals (all 5 locations fire at the same time, or at least within <500ms of each-other). This is real-time info, so time is of the essence, currently the script loads and parses at all 5 locations between 30ms-300ms (depending on the distance from the server)
I was originally having a pagination problem, where each of the 5 locations would pull a different database entry since i was moving to the next entry every time the script runs. I solved this by setting the script to only move to the next entry after a certain amount of time passed, solving the problem.
However, I also need the script to send an email every time it displays a new entry, I only want it to send one email. I've attempted to solve this by adding a "has been emailed" boolean to the database. But, since all the scripts run at the same time, this rarely works (it does sometimes). Most of the time I get 5 emails sent. The timeliness of sending this email doesn't have to be as fast as the graphic gets info from the script, 5-10 second delay is fine.
I've been trying to come up with a solution for this. Currently I'm thinking of spawning a python script through PHP, that has a random delay (between 2 and 5 seconds) hopefully alleviating the problem. However, I'm not quite sure how to run exec() command from php without the script waiting for the command to finish. Or, is there a better way to accomplish this?
UPDATE: here is my current logic (relevant code only):
//get the top "unread" information from the database
$query="SELECT * FROM database WHERE Read = '0' ORDER BY Entry ASC LIMIT 1";
//DATA
$emailed = $row["emailed"];
$Entry = $row["databaseEntryID"];
if($emailed == 0)
{
**CODE TO SEND EMAIL**
$EmailSent="UPDATE database SET emailed = '1' WHERE databaseEntryID = '$Entry'";
$mysqli->query($EmailSent);
}
Thanks!
You need to use some kind of locking. E.g. database locking
function send_email_sync($message)
{
sql_query("UPDATE email_table SET email_sent=1 WHERE email_sent=0");
$result = FALSE;
if(number_of_affacted_rows() == 1) {
send_email_now($message);
$result = TRUE;
}
return $result;
}
The functions sql_query and number_of_affected_rows need to be adapted to your particular database.
Old answer:
Use file-based locking: (only works if the script only runs on a single server)
function send_email_sync($message)
{
$fd = fopen(__FILE__, "r");
if(!$fd) {
die("something bad happened in ".__FILE__.":".__LINE__);
}
$result = FALSE;
if(flock($fd, LOCK_EX | LOCK_NB)) {
if(!email_has_already_been_sent()) {
actually_send_email($message);
mark_email_as_sent();
$result = TRUE; //email has been sent
}
flock($fd, LOCK_UN);
}
fclose($fd);
return $result;
}
You will need to lock the row in your database by using a transaction.
psuedo code:
Start transaction
select row .. for update
update row
commit
if (mysqli_affected_rows ( $connection )) >1
send_email();
If two users execute the same php file, will it be executed parallel or sequential? Example:
If I have a database data which only has one column id would it be possible that the following code produces for two different users the same outcome?
1. $db=startConnection();
2. $query="SELECT id FROM data";
3. $result=$db->query($query)or die($db->error);
4. $zeile=mysqli_fetch_row($result);
5. $number=$zeile['id'];
6. $newnumber=$number+1;
7. echo $number;
8. $update = "UPDATE data Set id = '$newnumber' WHERE id = '$number'";
9. $db->query($query)or die($db->error);
10. mysqli_close($db);
If it is not executed parallel, does it mean when 100 people are loading a php file that has a loading time of 1 second, then one of them has to wait 99 seconds?
Edit: In the comments it is stated that I could messup my database, I guess this is how it could mess up:
User A executes the file from 1.-7. in this moment user B executes the file from 1.-7. then A loads 8.-10. and B loads 8.-10. In this scenario both users would have the same number on the screen.
Now lets take the following example:
1. $db=startConnection();
2. $query=" INSERT INTO data VALUES ()";
3. $result=$db->query($query)or die($db->error);
4. echo $db->insert_id;
5. mysqli_close($db);
Lets say A executes the file from 1.-3. in this moment user B executes the file from 1.-5., after that user A loads the file from 4.-5. I guess in this scenario also both would have the same number on the screen right? Does transaction prevent both scenarios?
You can say that php files executed parallel (for most cases it is so, but this depends on web server).
Yes, it is possible that the following code produces for two different users the same outcome.
How to avoid this possibility?
1) If you are using MySQL, you can use transactions and "SELECT ... UPDATE FOR" to avoid this possibility. Just using transaction wouldn't help!
2) Be sure that you are using InnoDB or any other database engine that support transactions. For example MyISAM doesn't support transactions. Also you can have problems if any form of snapshotting is enabled in the database to handle reading locked records.
3) Example of using "SELECT ... UPDATE FOR":
$db = startConnection();
// Start transaction
$db->query("START TRANSACTION") or die($db->error);
// Your SELECT request but with "FOR UPDATE" lock
$query = "SELECT id FROM data FOR UPDATE";
$result = $db->query($query);
// Rollback changes if there is error
if (!$result)
{
mysql_query("ROLLBACK");
die($db->error);
}
$zeile = mysqli_fetch_row($result);
$number = $zeile['id'];
$newnumber = $number + 1;
echo $number;
$update = "UPDATE data Set id = '$newnumber' WHERE id = '$number'";
$result = $db->query($query);
// Rollback changes if there is error
if (!$result)
{
mysql_query("ROLLBACK");
die($db->error);
}
// Commit changes in database after requests sucessfully executed
mysql_query("COMMIT");
mysqli_close($db);
Why just using transaction wouldn't help?
Just transaction will lock only for write. You can test examples bellow by running two mysql console clients in two separate terminal windows. I did so and that's how it works.
We have client#1 and client#2 that executed parallel.
Example #1. Without "SELECT ... FOR UPDATE":
client#1: BEGIN
client#2: BEGIN
client#1: SELECT id FROM data // fetched id = 3
client#2: SELECT id FROM data // fetched id = 3
client#1: UPDATE data Set id = 4 WHERE id = 3
client#2: UPDATE data Set id = 4 WHERE id = 3
client#1: COMMIT
client#2: COMMIT
Both clients fetched the same id (3).
Example #2. With "SELECT ... FOR UPDATE":
client#1: BEGIN
client#2: BEGIN
client#1: SELECT id FROM data FOR UPDATE // fetched id = 3
client#2: SELECT id FROM data FOR UPDATE // here! client#2 will wait for end of transaction started by client#1
client#1: UPDATE data Set id = 4 WHERE id = 3
client#1: COMMIT
client#2: client#1 ended transaction and client#2 fetched id = 4
client#1: UPDATE data Set id = 5 WHERE id = 4
client#2: COMMIT
Hey, I think such read-locks reduce performance!
"SELECT ... FOR UPDATE" do read-lock only for clients that use "SELECT ... FOR UPDATE". That's good, cause it means that such read-lock wouldn't affect on standart "SELECT" requests without "FOR UPDATE".
Links
MySQL documentation: "SELECT ... FOR UPDATE" and other read-locks
Parallel or Sequential?
Part of your question was about PHP running either parallel or sequential. As I have read everything and its opposite about that topic, I decided to test it myself.
Field testing:
On a LAMP stack running PHP 5.5 w/ Apache 2, I made a script with a very expensive loop:
function fibo($n)
{
return ($n > 1) ? fibo($n - 1) + fibo($n - 2) : 1;
}
$start = microtime(true);
print "result: ".fibo(38);
$end = microtime(true);
print " - took ".round(($end - $start), 3).' s';
Result with 1 script running:
result: 63245986 - took 19.871 s
Result with 2 scripts running at the same time in two different browser windows:
result: 63245986 - took 20.753 s
result: 63245986 - took 20.847 s
Result with 3 scripts running at the same time in three different browser windows:
result: 63245986 - took 26.172 s
result: 63245986 - took 28.302 s
result: 63245986 - took 28.422 s
CPU usage while running 2 instances of the script:
CPU usage while running 3 instances of the script:
So, it's parallel!
Althoug inside a PHP script, you can't easily use multithreading (while it's possible), Apache takes benefit from your servers having multiple cores to dispatch the load.
So if your 1-second script is run by 100 users at the same time, well if you have 100 CPU cores the 100th user will hardly notice anything. If you have 8 CPU cores (which is more common), then the 100th user will theoritically have to wait something like 100 / 8 = 12.5 seconds for his instance of the script to begin. In practice, as the "benchmark" puts in evidence, each thread's performance diminishes when other threads are running at the same time on other cores. So it could be a lot more. But not 100 seconds more.
The High Level Idea:
I have a micro controller that can connect to my site via a http request...I want to feed the device a response as soon as a change is noted on the database...
Due to the the end device being a client ie micro controller...Im unaware of a method to pass the data to the client without having to set up port forwarding...which is heavily undesired ...The problem arise when trying send data from an external network to an internal one...Either A. port forwarding or B have the client device initiate the request which leads me to the idea of having the device send an http request to file that polls for changes
Update:
Much Thanks to Ollie Jones. I have implimented some of his
suggestions here.
Jason McCreary suggested having a modified column which is a big
improvement as it should increase speed and reliability ...Great
suggestion! :)
if the database being overworked is in question in this example
maybe the following would work where...when the data is inserted into
the database the changes are wrote to a file...then have the loop
that continuously checks that file for an update....thoughts?
I have table1 and i want to see if a specific row(based on a UID/key) has been updated since the last time i checked as well as continuously check for 60 seconds if the record bets updated...
I'm thinking i can do this using the INFORMATION_SCHEMA database.
This database contains information about tables, views, columns, etc.
attempt at a solution:
<?php
$timer = time() + (10);//add 60 seconds
$KEY=$_POST['KEY'];
$done=0;
if(isset($KEY)){
//loign stuff
require_once('Connections/check.php');
$mysqli = mysqli_connect($hostname_check, $username_check, $password_check,$database_check);
if (mysqli_connect_errno($mysqli))
{ echo "Failed to connect to MySQL: " . mysqli_connect_error(); }
//end login
$query = "SELECT data1, data2
FROM station
WHERE client = $KEY
AND noted = 0;";
$update=" UPDATE station
SET noted=1
WHERE client = $KEY
AND noted = 0;";
while($done==0) {
$result = mysqli_query($mysqli, $query);
$update = mysqli_query($mysqli, $update);
$row_cnt = mysqli_num_rows($result);
if ($row_cnt > 0) {
$row = mysqli_fetch_array($result);
echo 'data1:'.$row['data1'].'/';
echo 'data2:'.$row['data2'].'/';
print $row[0];
$done=1;
}
else {
$current = time();
if($timer > $current){ $done=0; sleep(1); } //so if I haven't had a result update i want to loop back an check again for 60seconds
else { $done=1; echo 'done:nochange';}//60seconds pass end loop
}}
mysqli_close($mysqli);
echo 'time:'.time();
}
else {echo 'error:nokey';}
?>
Is this an adequate method and suggestions to improve the speed as well as improve the reliability
If I understand your application correctly, your client is a microcontroller. It issues an HTTP request to your php / mysql web app once in a while. The frequency of that request is up to the microcontroller, but but seems to be once a minute or so.
The request basically asks, "dude, got anything new for me?"
Your web app needs to send the answer, "not now" or "here's what I have."
Another part of your app is providing the information in question. And it's doing so asynchronously with your microcontroller (that is, whenever it wants to).
To make the microcontroller query efficient is your present objective.
(Note, if I have any of these assumptions wrong, please correct me.)
Your table will need a last_update column, a which_microcontroller column or the equivalent, and a notified column. Just for grins, let's also put in value1 and value2 columns. You haven't told us what kind of data you're keeping in the table.
Your software which updates the table needs to do this:
UPDATE theTable
SET notified=0, last_update = now(),
value1=?data,
value2?=data
WHERE which_microcontroller = ?microid
It can do this as often as it needs to. The new data values replace and overwrite the old ones.
Your software which handles the microcontroller request needs to do this sequence of queries:
START TRANSACTION;
SELECT value1, value2
FROM theTable
WHERE notified = 0
AND microcontroller_id = ?microid
FOR UPDATE;
UPDATE theTable
SET notified=1
WHERE microcontroller_id = ?microid;
COMMIT;
This will retrieve the latest value1 and value2 items (your application's data, whatever it is) from the database, if it has been updated since last queried. Your php program which handles that request from the microcontroller can respond with that data.
If the SELECT statement returns no rows, your php code responds to the microcontroller with "no changes."
This all assumes microcontroller_id is a unique key. If it isn't, you can still do this, but it's a little more complicated.
Notice we didn't use last_update in this example. We just used the notified flag.
If you want to wait until sixty seconds after the last update, it's possible to do that. That is, if you want to wait until value1 and value2 stop changing, you could do this instead.
START TRANSACTION;
SELECT value1, value2
FROM theTable
WHERE notified = 0
AND last_update <= NOW() - INTERVAL 60 SECOND
AND microcontroller_id = ?microid
FOR UPDATE;
UPDATE theTable
SET notified=1
WHERE microcontroller_id = ?microid;
COMMIT;
For these queries to be efficient, you'll need this index:
(microcontroller_id, notified, last_update)
In this design, you don't need to have your PHP code poll the database in a loop. Rather, you query the database when your microcontroller checks in for an update/
If all table1 changes are handled by PHP, then there's no reason to poll the database. Add the logic you need at the PHP level when you're updating table1.
For example (assuming OOP):
public function update() {
if ($row->modified > (time() - 60)) {
// perform code for modified in last 60 seconds
}
// run mysql queries
}
I am running 10 PHP scripts at the same time and it processing at the background on Linux.
For Example:
while ($i <=10) {
exec("/usr/bin/php-cli run-process.php > /dev/null 2>&1 & echo $!");
sleep(10);
$i++;
}
In the run-process.php, I am having problem with database loop. One of the process might already updated the status field to 1, it seem other php script processes is not seeing it. For Example:
$SQL = "SELECT * FROM data WHERE status = 0";
$query = $db->prepare($SQL);
$query->execute();
while ($row = $query->fetch(PDO::FETCH_ASSOC)) {
$SQL2 = "SELECT status from data WHERE number = " . $row['number'];
$qCheckAgain = $db->prepare($SQL2);
$qCheckAgain->execute();
$tempRow = $qCheckAgain->fetch(PDO::FETCH_ASSOC);
//already updated from other processs?
if ($tempRow['status'] == 1) {
continue;
}
doCheck($row)
sleep(2)
}
How do I ensure processes is not re-doing same data again?
When you have multiple processes, you need to have each process take "ownership" of a certain set of records. Usually you do this by doing an update with a limit clause, then selecting the records that were just "owned" by the script.
For example, have a field that specifies if the record is available for processing (i.e. a value of 0 means it is available). Then your update would set the value of the field to the scripts process ID, or some other unique number to the process. Then you select on the process ID. When your done processing, you can set it to a "finished" number, like 1. Update, Select, Update, repeat.
The reason why your script executeds the same query multiple times is because of the parallelisation you are creating. Process 1 reads from the database, Process 2 reads from the database and both start to process their data.
Databases provide transactions in order to get rid of such race conditions. Have a look at what PDO provides for handling database transactions.
i am not entirely sure of how/what you are processing.
You can introduce limit clause and pass that as a parameter. So first process does first 10, the second does the next 10 and so on.
you need lock such as "SELECT ... FOR UPDATE".
innodb support row level lock.
see http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html for details.