I am using Mongo db for storing large sets of data that inserts hundreds of records within a millisecond. Its been couple of years the system is running and works. But as per business need I need to add a new index in mongo db collections:
I am using php Shanty library to create index. Here is the snippet of code
$indexArray[] = array(
"index" => array(
"category" => -1,
"sub_category" => 1,
"name" => 1,
"product_name" => 1,
"category_id" => 1,
"value" => -1,
"begin_dt_tm" => -1
),
"options" => array(
"background" => true,
"name" => "Index_CSNPCIdVBdt"
)
);
foreach ($indexArray as $columnIndexData) {
$newCollectionObject->ensureIndex($columnIndexData["index"], $columnIndexData["options"]);
}
This above creates the indexes fine. The only problem which I am facing is during the index creation process my system goes down and mongo db doesn't respond. I have set 'background:true' option that does this job in background but still it keeps my server unresponsive till indexes are created.
Is there any alternate to it so that mongo db remain responsive?
With a replica set you could do a rolling maintenance (basically create indexes on your secondaries while they are running as standalone instances) and that will not affect your clients. Since you have a standalone instance that is not an option for you.
I suspect that the load on your server is rather high and/or your hardware is the bottleneck (the usual suspects not enough RAM, slow disks...)
Related
I'm currently working on scanning a folder in my S3 bucket and removing files that are no longer in my database. The problem is that I have millions of files, so no way of scanning this in one go.
// get files
$files = $s3->getIterator('ListObjects', array(
"Bucket" => $S3Bucket,
"Prefix" => 'collections/items/',
"Delimiter" => '/'
), array(
'return_prefixes' => true,
'names_only' => true,
'limit' => 10
));
The documentation included something about limiting results, but I can't find anything about offsetting. I want to be able to start from 0, scan 500 items, remove them, stop, save the last scanned index and then run the script again, start from the saved index (501), scan 500 items, and so on.
Does the SDK offer some sort of offset option? Is it called something else? Or can you recommend a different method of scanning such a large folder?
Remember the last key you processed and use it as the Marker parameter.
$files = $s3->getIterator('ListObjects', array(
"Bucket" => "mybucket",
"Marker" => "last/key"
));
BTW, dont set Limit, its slowing down. Limit 10 will cause a request to the API every 10 objects, the API can return up to 1000 objects per request.
One of our Silverstripe sites is on shared hosting and having major performance issues. The issues seem to be caused by the shared SQL server throttling the number of queries that can be made.
The pages that are running the slowest get 200+ hundred pages to place on a Google Map:
$DirectoryItems = DirectoryItem::get()->where("\"Latitude\" IS NOT NULL AND \"Longitude\" IS NOT NULL ")->sort('Title ASC');
$MapItems = new ArrayList();
foreach ($DirectoryItems as $DirectoryItem) {
$MapItems->push(new ArrayData(array(
"Latitude" => $DirectoryItem->Latitude,
"Longitude" => $DirectoryItem->Longitude,
"MapMarkerURL" => $DirectoryItem->MapMarkerURL,
"Title" => addslashes($DirectoryItem->Title),
"Link" => $DirectoryItem->Link()
)));
}
Each of the 200+ MapItems generate it's own SQL Query which is overloading the shared SQL server.
I started off trying to get the same information with a single query:
$DirectoryItems = DB::query('SELECT `DirectoryItem`.`Latitude`, `DirectoryItem`.`Longitude`, `DirectoryItem`.`MapMarkerURL`, `SiteTree_Live`.`Title`
FROM `DirectoryItem`, `SiteTree_Live`
WHERE `DirectoryItem`.`ID` = `SiteTree_Live`.`ID`
AND `DirectoryItem`.`Latitude` IS NOT NULL AND `DirectoryItem`.`Longitude` IS NOT NULL
ORDER BY `SiteTree_Live`.`Title`');
$MapItems = new ArrayList();
foreach ($DirectoryItems as $DirectoryItem) {
$MapItems->push(new ArrayData(array(
"Latitude" => $DirectoryItem['Latitude'],
"Longitude" => $DirectoryItem['Longitude'],
"MapMarkerURL" => $DirectoryItem['MapMarkerURL'],
"Title" => addslashes($DirectoryItem['Title']),
"Link" => ??????
)));
}
But this falls over when it comes to getting the link to the DirectoryItem.
I thought about adding the Link as a DB field in DirectoryItem but that's beginning to feel needless complicated for what should be a straightforward operation.
What is the best way of getting the information for 200+ DirectoryItems in a single query?
Did you have a look into caching? If you show the same items on the map all the time you don't need to hit the db on every request.
See
Silverstripe Docs for caching
partial caching of elements of your site
Static publisher module for real fast, static pages managed with SilverStripe
Static publish queue module, another approach for generating static pages
It takes a huge load off your server if you cache properly.
If you still have problems when caching you should think about a better server.
The SiteTree class has static function, which is used in the CMS to get the link for a certain SiteTreeID. So you just need to extend your SQL Query to get the ID and you can get the link to any page via ID by calling:
$link = SiteTree::link_shortcode_handler(array('id' => $id), false);
Edit: wmk suggested a different and probably more future-proof way of using:
$page = SiteTree::get()->byID($id);
if ($page instanceof SiteTree) $link = $page->Link();
Untested; src: http://api.silverstripe.org/master/class-SiteTree.html
I'm not sure if this is even possible but what the heck :) I have two URL, both trying to insert different data into the same table. Example
We have table "food" and two URL with functionality that insert into FOOD table some values
http://example.com/insert_food_1
http://example.com/insert_food_2
When loading both URLs in the same time, every one of them waits for the other one to finish first and afterwards inserts into the DB the specific values.
I know this is called multithreading or something... but i'm not sure if this can be done with PHP (or Laravel).
Any help would be much appreciated. My code looks like so ...
$postToInsert = Post::create(
array(
'service_id' => 1,
'custom_id' => $post->body->items[0]->id,
'url' => $post->body->items[0]->share_url,
'title' => $post->body->items[0]->title,
'title_tags' => $tags,
'media_type' => $mediaType,
'image_url' => $imageURL,
'video_url' => $videoURL
));
$postToInsert->save();
I kind of fix it. Opening them in separate browsers or php them via CURL from terminal solves the problem.
Thanks for all the help
Ive got an xml feed of data, about sports events there are multiple events in the feed, and multiple attributes containing information for each event.
the feed im using is - http://whdn.williamhill.com/pricefeed/openbet_cdn?action=template&template=getHierarchyByMarketType&classId=5&marketSort=HH&filterBIR=N
I want to be able to build a page using this information, so i though the first step would be to turn each event into a an array and then have nested arrays in that with the information.
Something like this :
$events = array();
$event[101] => array(
"id" => "Logo Shirt, Red",
"name" => "img/shirts/shirt-101.jpg",
"url" => 18,
"date" => "2013-03-17",
"time" => "09:00:00",
"participant[101]" => array(
"name" => "Mark Webber",
"id"212770049",
"odds" => "4/1",
"oddsDecimal" => "5.00"
);
"participant[102]" => array(
"name" => "Sebastian Vettel",
"id"212770048",
"odds" => "1/7",
"oddsDecimal" => "1.14"
);
);
then go onto the next event and have that listed as $event[102]
How can i convert the data into a set of working variables like this ?
Also is this the best way to get the data i need from the xml feed to work with or would i be better pulling it from the feed as and when i needed a particular piece of data (although i presume that would need to call the feed each time) ? If so what would be the sytnax for that ?
You can use http://php.net/simplexml to parse XML file. If you are calling the feed on each request it will slow down the page, you can cache the feed lcoaly as XML file and parse that file.
You can set up a CRON job to update the cache file periodically according to your needs.
I am trying to modify a script that was developed to import article records from a Joomla (1.5.x) database into a Wordpress 3.2.1 table for posts. It is a script that migrates content from Joomla to Wordpress.
The issue I had with the script is that it did not maintain the unique identifier ('id' in Joomla, and 'ID' in Wordpress). Based on my understanding, this makes it a lot more complicated (much more work) to deal with redirecting all the Joomla permalinks over to the new (and obviously different) Wordpress permalinks. If the ID was the same in WP as it was in Joomla then some basic rewrite rules in htaccess would be enough to perform the redirections.
So I want to see if I can modify the code to force the ID rather than it being generated in consecutive order as records are inserted into the table.
The script I am modifying is available here: http://wordpress.org/extend/plugins/joomla-to-wordpress-migrator/
The file in question is called: joomla2wp-mig.php
The array is being created at around line 1049 and 1081.
At line 1049 it is:
$wp_posts[] = array(
'ID' => $R->id, //I ADDED THIS IN
'post_author' => $user_id,
'post_category' => array($wp_cat_id),
'post_content' => $post_content,
'post_date' => $R->created,
'post_date_gmt' => $R->created,
'post_modified' => $R->modified,
'post_modified_gmt' => $R->modified,
'post_title' => $R->title,
'post_status' => 'publish',
'comment_status' => 'open',
'ping_status' => 'open',
'post_name' => $R->alias,
'tags_input' => $R->metakey,
'post_type' => 'post'
);
And at line 1081 it is:
$array = array(
"ID" => $item['ID'], //I ADDED THIS IN
"post_author" => $user_id,
"post_parent" => intval($wp_cat_id),
"post_content" => $item['post_content'],
"post_date" => $item['post_date'],
"post_date_gmt" => $item['post_date_gmt'],
"post_modified" => $item['post_modified'],
"post_modified_gmt" => $item['post_modified_gmt'],
"post_title" => $item['post_title'],
"post_status" => $item['post_status'],
"comment_status" => $item['comment_status'],
"ping_status" => $item['ping_status'],
"post_name" => $item['post_name'],
"post_type" => $item['post_type']
);
I have commented the ID line which I have added into the top of each of these bits of array code.
The INSERT command is being implimented around line 1097
The INSERT command is put together like this:
$insert_sql = "INSERT INTO " . $j2wp_wp_tb_prefix . "posts" . " set ";
$inserted = 0;
foreach ($array as $k => $v)
{
if($k AND $v)
{
if($inserted > 0)
$insert_sql .= ",";
$insert_sql .= " ".$k." = '".mysql_escape_string(str_replace("`","",$v))."'";
++$inserted;
}
}
$sql_query[] = $insert_sql;
}
It uses the MYSQL function INSERT INTO... SET (as opposed to INSERT INTO... VALUE)
The challenge I have is this:
The array did not include the ID, so I have added this in.
Having made this change, when I run the script it will appear (at the Wordpress UI end) to run fine, but no records are inserted, even though it says it was successful.
I found I could get around that issue by setting up a fresh wp_posts table with X number of blank records. Let's say I am importing 100 articles, then I would put 100 records into the table, and they would have ID 1 to 100. When I run my modified code it will happily update and populate these existing records. What I don't understand is why it will not create new records when I force the unique identifier (ID) to what I want it as.
I am wondering if I need to use the INSERT INTO... VALUE command instead of INSERT INTO... SET
I was going to test that out, but to be honest I am not a programmer and am just winging it as I go along. So I had not idea how to rewrite the PHP in order to impliment the structure required for the VALUE command in place of SET.
Any help or suggestions would be greatly appreciate.
I gather I have to rewrite the last bit of code I provded above.
There is some discussion on this matter at the wordpress support forums. One user (spiff06) very kindly helped troubleshoot the issue with me. We came unstuck around getting the code to insert new records with a forced identifier, and I went with what we referred to as the "messy" option (which is the method I mentioned above... setting up a table with the required number of blank records).
Even though I've used that "messy" method for my own site, it is my wish to make this process work cleanly for other users who are on Joomla 1.5.x and are switching to WP instead of upgrading to a newer Joomla release (which is a big process, so many are just jumping ot WP, like me).
With much thanks...
Jonathan
You can try the following:
Change the structure of the imported mysql table (post#wordpress). Change the id field so it is not anymore an autoincrement field. Let it being just an integer field.
Import the values. This way you can put any values in the field ID without limitations at all.
After the importation change again the structure of the table to set the ID field to be again an autoincrement field.
I never found a truly automated / scripted way of doing this. I ended up doing a workaround:
For now I've imported all my posts the "messy" way, by prepopulating the table.
THE WORKSROUND METHOD
Prepopulate the wp_posts table in the WP database with as many records as you require (look in Joomla to see how many records you have). I had 398, so I added 398 records to wp_posts.
HOW? I did it by exporting the emtpy wp_posts table to a .csv file. I then opened this in Excel (Numbers, or OpenOffice would also do). In the spreadsheet application it was easy to autofill 1 to 398 in the ID column.
I then reimported that .csv file into wp_posts. This gave me a wp_posts with 398 record rows with 1 to 398 in the ID field.
I then ran version 1.5.4 of Mambo/Joomla to WordPress migrator, which can be installed from within WordPress.
End result?
All posts have the same ID as the original Joomla articles.