I am writing a PHP script that runs on a cron and pulls JSON data from an API [ title (text), path (text), visitors (integer) ] and stores it in a SQLite database. Every time it runs, if it sees an existing title, it should add the new visitors count to the existing visitors. If not, it should add the new data as a new row.
Here's a simplified look at my loop:
foreach($results as $printresults) {
//this iterates though $title, $path and $visitors
$existing_visitors = $db->query("SELECT SUM(visitors) FROM topten WHERE
title='$title'");
while ($existing_vis_row = $existing_visitors->fetch(SQLITE_NUM)) {
$dupe_enter = $db->query("UPDATE topten SET title='$title', path='$path',
visitors='$existing_vis_row[0]' WHERE title='$title' ");
}
$db->query("INSERT INTO topten (id,title,path,visitors,time) VALUES
(NULL, '$title', '$path', '$visitors', '$time');");
}
Then I'll do a SELECT to pull DISTINCT rows ordered by visitors and write this to a file. Since the UPDATE query adds all the visitors to these rows, it doesn't matter that there will be all the dupes. On a certain timeout, I'll drop the whole table and start collecting again, so the file doesn't get too unwieldy.
The problem is that it is adding the summed visitor counts on every pass of the loop making the visitor counts totally out of whack. But I couldn't find a better way to simply add the data together every time the script was run.
pseudo-code:
for($json_records as $rec){
$row = SELECT visitors FROM topten WHERE title = $rec['title']
if($row)
//record exists, add visitors and update
$sum_visitors = $row['visitors'] + $rec['visitors']
UPDATE topten SET visitors = $sum_visitors WHERE title = $rec['title']
else
//record doesn't exist, insert new
INSERT topten (title, visitors) VALUES ($rec['title'], $rec['visitors'])
}
Maybe?
avoid dupes. set a unique key and use INSERT OR REPLACE ... instead of doing it yourself.
something like CREATE UNIQUE INDEX 'title_path' ON topten (title, path). this will make impossible to have two records with the same title and path fields. so, if you just do a blind INSERT ...., you'd get a conflict error if it would be a dupe.
so, just use INSERT OR REPLACE ...., this would first check any unique index and if there's already a record, it would be erased, then it would do the insert. of course it's all atomic (so other process checking won't see the record disappear and reappear).
Related
I want to only run the update query if row exists (and was inserted). I tried several different things but this could be a problem with how I am looping this. The insert which works ok and creates the record and the update should take the existing value and add it each time (10 exists + 15 added, 25 exists + 15 added, 40 exists... I tried this in the loop but it ran for every item in a list and was a huge number each time. Also the page is run each time when a link is clicked so user exits and comes back
while($store = $SQL->fetch_array($res_sh))
{
$pm_row = $SQL->query("SELECT * FROM `wishlist` WHERE shopping_id='".$store['id']."'");
$myprice = $store['shprice'];
$sql1 = "insert into posted (uid,price) Select '$uid','$myprice'
FROM posted WHERE NOT EXISTS (select * from `posted` WHERE `uid` = '$namearray[id]') LIMIT 1";
$query = mysqli_query($connection,$sql1);
}
$sql2 = "UPDATE posted SET `price` = price + '$myprice', WHERE shopping_id='".$_GET['id']."'";
$query = mysqli_query($connection,$sql2);
Utilizing mysqli_affected_rows on the insert query, verifying that it managed to insert, you can create a conditional for the update query.
However, if you're running an update immediately after an insert, one is led to believe it could be accomplished in the same go. In this case, with no context, you could just multiply $myprice by 2 before inserting - you may look into if you can avoid doing this.
Additionally, but somewhat more complex, you could utilize SQL Transactions for this, and make sure you are exactly referencing the row you would want to update. If the insert failed, your update would not happen.
Granted, if you referenced the inserted row perfectly for your update then the update will not happen anyway. For example, having a primary, auto-increment key on these rows, use mysqli_insert_id to get the last inserted ID, and updating the row with that ID. But then this methodology can break in a high volume system, or just a random race event, which leads us right back to single queries or transaction utilization.
What is the appropriate and most efficient way to store a view count each time a database record is access?
I have table ITEMS containing the following fields: id, item_name
Each item has its own permalink: http://domain.com/item_name
I would like to be able to use this data and display a Views: 2,938 on the page. Which method is best?
Method A
Create an additional field view_count in the ITEMS table and update it to increment the view count:
view_count = view_count + 1;
Method B
Create a new table VIEWS containing the following fields:
id, item_id, timestamp, ip
And add a new record to the VIEWS table each time a page is viewed.
Method C
Is there another method altogether?
I have seen Method A used in a variety of PHP forum software, however my University instructors have informed me that Method B is better because add() requires less strain than update()
It's somewhat true that an INSERT operation will often consume less resources than an UPDATE statement; but a well tuned UPDATE statement isn't necessarily a strain".
There are other considerations. If the only known and foreseen requirement is for a "total" page count, then a single row will consume much less storage than storing individual rows for each "view" event. Also, the queries to retrieve the view count will be much more efficient.
(Where the "strain" is in the real world is in storage, not in terms of just disk space, but the number of tapes and the amount of clock time required for backups, time required for restore, etc.)
If you need to be able to report on views by hour, or by day, then having the more granular data will provide you the ability to do that, which you can't get from just a total.
You could actually do both.
If I needed a "summary" count to put the page, I wouldn't want to have to run a query against a boatload of rows in the historical event record, just to come up with a new value of 2,939 the next time the page is viewed. It would be much less of a "strain" on database resources to retrieve a single column value from a single row, than it would be to churn through thousands of rows of event data to aggregate it into a count, every time a page is viewed.
If I also needed to be able to administratively report "views" by country, or by IP, by hour, by day or week or month, I would also store the individual event rows, to have them available for slice and dice analytics.
there is another way so you can get unique IP.. this will help you : http://codebase.eu/source/code-php/ip-counter/
but if you want to store it in database, let me know if this work for you:
CREATE TABLE `counter` (
`file` VARCHAR(200),
`time` VARCHAR(20),
PRIMARY KEY (`file`)
);
and create counter.php
<?php
//read the name of your page
$file=$_SERVER['PHP_SELF'];
//connect to database
mysql_connect("dbhost","dbuser","dbpass");
mysql_select_db("dbname");
//database check
$sql=mysql_fetch_array(mysql_query("SELECT * FROM `counter` WHERE `file` = '$file'"));
if ($sql)
{
//if there is data - it will add one
$add_counter=mysql_query("UPDATE `counter` SET `time` = `time` + 1 WHERE `file` = '$file' LIMIT 1");
}
else
{
//if empty - it will create one
$add_data=mysql_query("INSERT INTO `counter` (`file`,`time`) VALUES ('$file','1')");
} ?>
then put this to show your counter
<?php
//connect database
mysql_connect("dbhost","dbuser","dbpass");
mysql_select_db("dbname");
//show array from database
$sql=mysql_query("SELECT * FROM `counter` ORDER BY `file` ASC");
while ($data=mysql_fetch_array($sql))
{
echo ("This ".$data['file']." Get ".$data['time']." Views");
}
?>
As the title said,
I always want to know how does the sphinx works,how does it realized that the data has updated and how does it konw which rows what has updated?espacily the last two questions.
sphinx can't guess when data changes, you need to tell him to check for new data with cronjob for instance. Let me show you an example :
Let's say you use sphinx to index users, every day, you will have a cronjob that will reindex all data from users from scratch (which means index is destroyed and entirely recreated), and then, every 5 minutes, you will have another cronjob that will index new data only (delta).
So in your users table, you need to have a column to know the last time it was updated, let's call it updated_at.
Your delta_index will then check for users that have been updated since last check.
source user
{
...
sql_query_pre = REPLACE INTO sph_counter SELECT 'user', #max_stamp:=UNIX_TIMESTAMP(MAX(updated_at)) FROM users
sql_query = SELECT user_id, email FROM users WHERE updated_at <= FROM_UNIXTIME(#max_stamp)
...
}
source user_delta : user
{
...
sql_query = SELECT user_id, email FROM users WHERE updated_at >= (SELECT FROM_UNIXTIME(MAX_ID) FROM sph_counter WHERE INDEX_NAME = 'user')
...
}
Does it sound clear to you ?
I'm trying to lock a row in a table as being "in use" so that I don't process the data twice when my cron runs every minute. Because of the length of time it takes for my script to run, the cron will cause multiple instances of the script to run at once (usually around 5 or 6 at a time). For some reason, my "in use" method is not always working.
I do not want to LOCK the tables because I need them available for simultaneous processing, that is why I went the route of pseudo-locking individual rows with an 'inuse' field. I don't know of a better way to do this.
Here is an illustration of my dilemma:
<?
//get the first row from table_1 that is not in use
$result = mysqli_query($connect,"SELECT * FROM `table_1` WHERE inuse='no'");
$rows = mysqli_fetch_array($result, MYSQLI_ASSOC);
$data1 = $rows[field1];
//"lock" our row by setting inuse='yes'
mysqli_query($connect,"UPDATE `table_1` SET inuse='yes' WHERE field1 = '$data1'");
//insert new row into table_2 with our data if it doesn't already exist
$result2 = mysqli_query($connect,"SELECT * FROM `table_2` WHERE field='$data2'");
$numrows = mysqli_num_rows($result2);
if($numrows >= 1) {
//do nothing
} else {
//run some unrelated script to get data
$data2 = unrelatedFunction();
//insert our data into table_2
mysqli_query($connect,"INSERT INTO `table_2` (field) value ('$data2')");
}
//"unlock" our row in table_1
mysqli_query($connect,"UPDATE `table_1` SET inuse='no' WHERE field1 = '$data1'");
?>
You'll see here that $data2 won't be collected and inserted if a row already exists with $data2, but that part is for error-checking and does not answer my question as the error still occurs. I'm trying to understand why (if I don't have that error-check in there) my 'inuse' method is sometimes being ignored and I'm getting duplicate rows in table_2 with $data2 in them.
There's a lot of time in between your first select and the first update where another process can do the same operation. You're not using transaction either, so you're not guaranteeing any order of the changes becoming visible to others.
You can either move everything into a transaction with the isolation level you need and use SELECT .... FOR UPDATE syntax. Or you can try doing the copy in a different way. For example update N rows that you want to process and SET in_use=your_current_pid WHERE in_use IS NULL. Then you can read back the rows you manually marked for processing. After you finish, reset in_use to NULL.
Storing randomly picked rows in session?
Hi!
I’m building a PHP script that outputs a random word from a MySQL table. Each time the script is refreshed I want a new word to display. (It is connected to jquery – so the data output of the php-file is directly displayed on my page)
However, I want to display each word only once. If all the words are picked, I want the script to reset and start picking over again.
Right now I have done this by setting up an additional table called “visited” and putting all the picked rows from table “wordlist” in there, with the users unique session-id to
prevent results from multiple users to interfere with eachother.
So the query goes like this:
session_start();
$id = session_id();
$random_sql = "SELECT *
FROM wordlist AS a
LEFT JOIN visited AS b ON a.word = b.word
AND b.sessionid = '$id'
WHERE b.word IS NULL
ORDER BY a.weight * rand( ) DESC // Weighted random
LIMIT 1";
$random_row = mysql_query($random_sql);
if(mysql_num_rows($random_row) > 0)
{
while($row = mysql_fetch_row($random_row))
{
$insert_query = "INSERT INTO visited (ID, word, sessionid, date) VALUES ('$row[0]', '$row[1]', '$id', CURDATE())";
$insert = mysql_query($insert_query) or die (mysql_error());
echo $row[1];
}
This works perfectly fine, but I reckon it would be hard for the database to handle many visitors at the same time?
So my question is:
How can I store the information of “visited” words in a session and exclude them from the query?
One more thing: I’m estimating that the wordlist-table will have around 8000 rows. Will this be too many for the ORDER BY RAND-function, and render out to be noticeably slow?
Thanks!
This depends on how much the data must be persistent. If you don't need persistency then session is of course much more efficient in this case. You can store there any PHP data structure, i.e. I guess you'd use associative array in this case.
Regarding performance: if the words are indexed sequentially, you can think of generating the random number as a direct id and just retrieve the particular row directly. ORDER BY RAND() must generate all the numbers and sort them, which is much less efficient than just generate one id like RAND() * MAX(ID).
Read more here.