Race Condition in PHP File Writing - php

So I have a script for accepting and processing request from another scripts and/or applications. However, one of the task has to be done with my script is assigning each request a unique, sequential "ID" to each of them.
For example, let's says that application A is giving a 1000 request to my script, and in the same time, application B is giving 500 request to my script. I have to give them 1500 unique, sequential number, like 2001~3500 to each of them.
The order between them however, does not matter, so I can give them numbers like this :
#2001 for 1st request from A (henceforth, A1)
#2002 for A2
#2003 for B1
#2004 for A3
#2005 for B2
...and so on...
I've tried creating a file that stores that number and a separated lock file with a function like this :
private function get_last_id()
{
// Check if lock file exists...
while (file_exists("LAST_ID_LOCKED")) {
// Wait a little bit before checking again
usleep(1000);
}
// Create the lock file
touch("LAST_ID_LOCKED");
// Create the ID file for the first time if required
if (!file_exists("LAST_ID_INDICATOR")) {
file_put_contents("LAST_ID_INDICATOR", 0);
}
// Get the last ID
$last_id = file_get_contents("LAST_ID_INDICATOR");
// Update the last ID
file_put_contents("LAST_ID_INDICATOR", $last_id + 1);
// Delete the lock file
unlink("LAST_ID_LOCKED");
return $last_id;
}
This code, however, would create a race condition, where if I send them those 1500 request, the last ID will have quite a number missings, (e.g. only reach 3211 instead of 3500).
I've also tried using flock like this, but to no avail :
private function get_last_id()
{
$f = fopen("LAST_ID_INDICATOR", "rw");
while (true) {
if (flock($f, LOCK_SH)) {
$last_id = fread($f, 8192);
flock($f, LOCK_UN);
fclose($f);
break;
}
usleep($this->config["waiting_time"]);
}
$f = fopen("LAST_ID_INDICATOR", "rw");
while (true) {
if (flock($f, LOCK_SH)) {
$last_id = fread($f, 8192);
$last_id++;
ftruncate($f, 0);
fwrite($f, $last_id);
flock($f, LOCK_UN);
fclose($f);
break;
}
usleep($this->config["waiting_time"]);
}
return $last_id;
}
So, what else can I do to look for a solution for this situation?
Notes: Due to server limitation, I'm limited to PHP 5.2 without something like semaphores and such.

Since no-one seems to be giving an answer, I'll give you a possible solution.
Use the Lamport's Bakery Algorithm as part of your solution.
Edit: The filter lock would work even better if you don't need the order preserved.
Obviously this will have its own challenges implementing but it's worth a try and if you get it right, it might just do the trick for what you want to do.
Since you mentioned semaphores, I assume you know enough knowledge to understand the concept.
This can be found in chapter 2 of "The art of multiprocessor programming".

If you have access to a database whith locking capabilities you can use that. E.g. for MySQL with skeleton PHP code:
Create a table with one row and one column (if you do not want "dual-use" of an already existing table):
$sql = 'CREATE TABLE TABLENAME (COLUMNNAME INTEGER) ENGINE=MyISAM';
excecuteSql($sql) ...
Create a PHP function to (re)set the counter/Id value:
$sql = 'UPDATE TABLENAME SET COLUMNNAME=0';
executeSql($sql); ...
Create a PHP function to get a unique, successive id:
$sql = "SELECT GET_LOCK('numberLock',10)";
executeSql($sql); ...
$sql = 'SELECT * FROM TABLENAME';
if ($result = mysqli_query($link, $sql)) {
$row = mysqli_fetch_row($result);
$wantedId = $row[0];
// do something with the id ...
mysqli_free_result($result);
}
$sql = 'UPDATE TABLENAME SET COLUMNNAME=COLUMNNAME+1';
executeSql($sql); ...
$sql = "SELECT RELEASE_LOCK('numberLock')";
executeSql($sql); ...

Related

Always create UNIQUE ID without using database - PHP

I have I have static HTML / JS page without any database (I don't need it). However I start fight with some issue. I need to generate random ID which have to be unique(used once and never ever again).
Stardard MySQL way:
On standard application I will do it with storing all used IDs in database
and when needed new one I will simply
// I generate ID here, let's say it's 123
SELECT COUNT(id) FROM table WHERE id = 123
My solution which may be not the best one:
I am thinking alternative may be some storing in file
// Let's generate ID here, and make it 123 again
$handle = fopen("listOfIDs.txt", "r");
if ($handle) {
$used = false;
while (($line = fgets($handle)) !== false) {
if($line = 123) {
$used = true;
break;
}
}
fclose($handle);
return $used;
} else {
// error opening the file.
}
Dispite the fact I can imagine my option may work it could be super slow when file become bigger.
Question:
What's the best way to keep simple unique IDs without using database.
Edit:
I forgot to mentioned that unique ID have to be numbers only.
You could use uniqid(), str_rand(something), timestamp, some hash with random data that you probably never get twice.
Or, doing your way, you could have a single line with information in this file, just the last used id:
$fp = fopen("lastID.txt", "r+");
if (flock($fp, LOCK_EX)) { // lock the file
$lastId = fgets($fp); // get last id
$newId = $lastId +1; //update the id
ftruncate($fp, 0); // truncate file
fwrite($fp, $newId); //write the new Id
fflush($fp); // flush output
flock($fp, LOCK_UN); // release the lock
} else {
throw new Exception("Could not get the lock");
}
fclose($fp);
return $newId;
Maybe you can use a mix of the current timestamp, the request ip, and a randomnumber:
$unique = microtime().$_SERVER['REMOTE_HOST'].rand(0,9999);
Sure theorically you can get a request that executes this operation in the same microtime, with the same remote host, and matching the random number, but it's almost impossible, I think.
You can use PHP uniqid with $more_entropy = true, which increases the likelihood that the result will be unique.
$id = uniqid("", true);
You can also work some random number out using current date and time, etc. You should consider how fast the users may generate these numbers, if it's super fast, there might be a possibility of two different users generating the same $id because it happened at the same time.

Getting faster results from 2 databases to form 1 resultset

So here is my scenario...
The bug_tracker table is in one server and task_traker is in another.
I want to show a combined result but can't since there are in two separate databases remotely.
So I am calling the task tracker first and then getting the bug details per iteration.
$task = oci_parse($task_conn, "select * from task_table where ....");
oci_execute($task);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
$bug = oci_parse($bug_conn, "select * from bug_table where id = " . $task_row['BUGID'] );
oci_execute($bug);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
... //output
}
... //output
}
But this entire process is making it very slow... since there are large number of records and columns.
Is there any way to make it even slightly faster? Note: I don't have access so can't setup oracle db links.
You could improve it using the IN statement:
<?php
$task = oci_parse($task_conn, "select * from task_table where ....");
oci_execute($task);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
$bugs[] = $task_row['BUGID'];
$users[] = $task_row['USER'];
$status[] = $task_row['TASK_STATUS'];
}
$bug = oci_parse($bug_conn, "select * from bug_table where id IN (" . implode(',', $bugs) . ");" );
oci_execute($bug);
while ($task_row = oci_fetch_array($task, OCI_ASSOC+OCI_RETURN_NULLS)) {
// ...
}
?>
On a sidenote, why are you not using PDO? I believe using it will already give you a performance boost.
PHP is not meant for this kind of operation, neither should you try to write your own join function.
One proper way of solving this issue is to dump the data from both databases into a local database, and there do the join.
You do not need anything fancy for the local database, an SQLite3 is probably enough.
Just dump the data from each database into a CSV files using a bash script that you put into cron. After the dump, (re)create each table in your SQLite3, and load the CSVs into these tables. After this you can do a join once and push the result into a new table which you then are free to query.
This is what in the datawarehouse world is often referred to as an ETL process, just in this case, very very simplified.

Splitting a string of values like 1030:0,1031:1,1032:2 and storing data in database

I have a bunch of photos on a page and using jQuery UI's Sortable plugin, to allow for them to be reordered.
When my sortable function fires, it writes a new order sequence:
1030:0,1031:1,1032:2,1040:3,1033:4
Each item of the comma delimited string, consists of the photo ID and the order position, separated by a colon. When the user has completely finished their reordering, I'm posting this order sequence to a PHP page via AJAX, to store the changes in the database. Here's where I get into trouble.
I have no problem getting my script to work, but I'm pretty sure it's the incorrect way to achieve what I want, and will suffer hugely in performance and resources - I'm hoping somebody could advise me as to what would be the best approach.
This is my PHP script that deals with the sequence:
if ($sorted_order) {
$exploded_order = explode(',',$sorted_order);
foreach ($exploded_order as $order_part) {
$exploded_part = explode(':',$order_part);
$part_count = 0;
foreach ($exploded_part as $part) {
$part_count++;
if ($part_count == 1) {
$photo_id = $part;
} elseif ($part_count == 2) {
$order = $part;
}
$SQL = "UPDATE article_photos ";
$SQL .= "SET order_pos = :order_pos ";
$SQL .= "WHERE photo_id = :photo_id;";
... rest of PDO stuff ...
}
}
}
My concerns arise from the nested foreach functions and also running so many database updates. If a given sequence contained 150 items, would this script cry for help? If it will, how could I improve it?
** This is for an admin page, so it won't be heavily abused **
you can use one update, with some cleaver code like so:
create the array $data['order'] in the loop then:
$q = "UPDATE article_photos SET order_pos = (CASE photo_id ";
foreach($data['order'] as $sort => $id){
$q .= " WHEN {$id} THEN {$sort}";
}
$q .= " END ) WHERE photo_id IN (".implode(",",$data['order']).")";
a little clearer perhaps
UPDATE article_photos SET order_pos = (CASE photo_id
WHEN id = 1 THEN 999
WHEN id = 2 THEN 1000
WHEN id = 3 THEN 1001
END)
WHERE photo_id IN (1,2,3)
i use this approach for exactly what your doing, updating sort orders
No need for the second foreach: you know it's going to be two parts if your data passes validation (I'm assuming you validated this. If not: you should =) so just do:
if (count($exploded_part) == 2) {
$id = $exploded_part[0];
$seq = $exploded_part[1];
/* rest of code */
} else {
/* error - data does not conform despite validation */
}
As for update hammering: do your DB updates in a transaction. Your db will queue the ops, but not commit them to the main DB until you commit the transaction, at which point it'll happily do the update "for real" at lightning speed.
I suggest making your script even simplier and changing names of the variables, so the code would be way more readable.
$parts = explode(',',$sorted_order);
foreach ($parts as $part) {
list($id, $position) = explode(':',$order_part);
//Now you can work with $id and $position ;
}
More info about list: http://php.net/manual/en/function.list.php
Also, about performance and your data structure:
The way you store your data is not perfect. But that way you will not suffer any performance issues, that way you need to send less data, less overhead overall.
However the drawback of your data structure is that most probably you will be unable to establish relationships between tables and make joins or alter table structure in a correct way.

Copy/duplicate/backup database tables effectively - mysql

Reason: I was assigned to run some script that advances a website,it's a fantasy football site and there are several instants of the site located into different domains. Some has more than 80k users and each users supposed to have a team that consists of 15 players. Hence some tables have No.users x No.players rows.
However Sometimes the script fails and the result gets corrupted, therefore I must backup 10 tables in question before i execute the script. Nevertheless, I still need to backup the tables to keep historical record of users action. Because football matches may last for 50+ game weeks.
Task: To duplicate db tables using php script. When i started i used to backup the tables using sqlyog. it's works but it's time consuming since I have to wait for each table to be duplicated. Besides, for large tables the sqlyog application crashes during the duplicating of large tables which may be very annoying.
Current solution: I have created a simple application with interface that does the job and it works great. It consist of three files, one for db connection, 2nd for db manipulation, 3rd for user interface and to use the 2nd file's code.
The thing is, sometimes it get stuck at the middle of duplicating tables process.
Objective: To create an application to be used by admin to facilitate database backing up using mysql+php.
My Question: How to ensure that the duplicating script will definitely backup the table completely without hanging the server or interrupting the script.
Down here I will include my code for duplicating function, but basically these are the two crucial lines that i think the problem is located in them:
//duplicate tables structure
$query = "CREATE TABLE $this->dbName.`$newTableName` LIKE $this->dbName.`$oldTable`";
//duplicate tables data
$query = "INSERT INTO $this->dbName.`$newTableName` SELECT * FROM $this->dbName.`$oldTable`";
The rest of the code is solely for validation in case error occur. If you wish to take a look at the whole code, be my guest. Here's the function:
private function duplicateTable($oldTable, $newTableName) {
if ($this->isExistingTable($oldTable))
{
$this->printLogger("Original Table is valid -table exists- : $oldTable ");
}
else
{
$this->printrR("Original Table is invalid -table does not exist- : $oldTable ");
return false;
}
if (!$this->isExistingTable($newTableName))// make sure new table does not exist alrady
{
$this->printLogger("Distination Table name is valid -no table with this name- : $newTableName");
$query = "CREATE TABLE $this->dbName.`$newTableName` LIKE $this->dbName.`$oldTable`";
$result = mysql_query($query) or $this->printrR("Error in query. Query:\n $query\n Error: " . mysql_error());
}
else
{
$this->printrR("Distination Table is invalid. -table already exists- $newTableName");
$this->printr("Now checking if tables actually match,: $oldTable => $newTableName \n");
$varifyStatus = $this->varifyDuplicatedTables($oldTable, $newTableName);
if ($varifyStatus >= 0)
{
$this->printrG("Tables match, it seems they were duplicated before $oldTable => $newTableName");
}
else
{
$this->printrR("The duplicate table exists, yet, doesn't match the original! $oldTable => $newTableName");
}
return false;
}
if ($result)
{
$this->printLogger("Query executed 1/2");
}
else
{
$this->printrR("Something went wrong duplicateTable\nQuery: $query\n\n\nMySql_Error: " . mysql_error());
return false;
}
if (!$this->isExistingTable($newTableName))//validate table has been created
{
$this->printrR("Attemp to duplicate table structure failed $newTableName table was not found after creating!");
return false;
}
else
{
$this->printLogger("Table created successfully: $newTableName");
//Now checking table structure
$this->printLogger("Now comparing indexes ... ");
$autoInc = $this->checkAutoInc($oldTable, $newTableName);
if ($autoInc == 1)
{
$this->printLogger("Auto inc seems ok");
}
elseif ($autoInc == 0)
{
$this->printLogger("No inc key for both tables. Continue anyways");
}
elseif ($autoInc == -1)
{
$this->printLogger("No match inc key!");
}
$time = $oldTable == 'team_details' ? 5 : 2;
$msg = $oldTable == 'team_details' ? "This may take a while for team_details. Please wait." : "Please wait.";
$this->printLogger("Sleep for $time ...\n");
sleep($time);
$this->printLogger("Preparing for copying data ...\n");
$query = "INSERT INTO $this->dbName.`$newTableName` SELECT * FROM $this->dbName.`$oldTable`";
$this->printLogger("Processing copyign data query.$msg...\n\n\n");
$result = mysql_query($query) or $this->printrR("Error in query. Query:\n $query\n Error: " . mysql_error());
// ERROR usually happens here if large tables
sleep($time); //to make db process current requeste.
$this->printLogger("Query executed 2/2");
sleep($time); //to make db process current requeste.
if ($result)
{
$this->printLogger("Table created ($newTableName) and data has been copied!");
$this->printLogger("Confirming number of rows ... ");
/////////////////////////////////
// start checking count
$numRows = $this->checkCountRows($oldTable, $newTableName);
if ($numRows)
{
$this->printLogger("Table duplicated successfully ");
return true;
}
else
{
$this->printLogger("Table duplicated, but, please check num rows $newTableName");
return -3;
}
// end of checking count
/////////////////////////////////
}//end of if(!$result) query 2/2
else
{
$this->printrR("Something went wrong duplicate Table\nINSERT INTO $oldTable -> $newTableName\n\n$query\n mysql_error() \n " . mysql_error());
return false;
}
}
}
AS you noticed the function is only to duplicate one table, that's why there is another function that that takes an array of tables from the user and pass the tables names array one by one to duplicateTable().
Any other function should be included for this question, please let me know.
One solution pops into my mind, would duplicating tables by part by part add any improvement, I'm not sure how Insert into works, but maybe if I could insert let's say 25% at a time it may help?
However Sometimes the script fails and the result gets corrupted,
therefore I must backup 10 tables in question before i execute the
script.
Probably you need to use another solution here: transactions. You need to wrap up all queries you are using in failing script into transaction. If transaction fails all data will be the same as in the beginning of the operation. If queries got executed correctly - you are OK.
why are you every time duplicating the table..
CLUSTERS are good option which can make duplicate copies of your table in distributed manner and is much more reliable and secure.

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories