Error using node_load_multiple() API for Drupal 7

Error using node_load_multiple() API for Drupal 7 - php

I am building a custom module for drupal 7 which delete all nodes of a content type. I need to load all nodes of content type. For it I have this code:
$type = "apunte";
$nodes = node_load_multiple(array(), array('type' => $type));
my problem is I have a lot of nodes of this type (almost 100000) and I always get error. If I try it with another type with only 2 or 3 nodes it works ok.
When I run my module in local (windows 8.1) I get error time exeeded (it never finish) and when I run in my server (debian 6) I get error 500. I use apache in both local and server.
How I could do it when I have too many nodes?
Thank you.

If you do a node_load_multiple of 100 000 nodes, you will get an array of 100 000 node object + their custom fields meaning that you will likely get millions of mysql requests and all this taking a big amount of ram.
To delete a huge amount of nodes, query your database to extract all the nids, split you array of nids in packets of 50 or 100 nids. And loop on each packet to make your node_load_multiple (why don t you use node_delete_multiple?).
If this still take.longer than the max.excution time of your php.ini and you can not change it. You can use the batch api of drupal so each packet will be dealt as a separate http request and so the max execution time will only affect the delete of 50/100 nodes.
Edit :
Try this :
$sql = 'SELECT nid FROM node n WHERE n.type = :type';
$result = db_query($sql, array(':type' => 'apunte'))->fetchCol();
foreach (array_chunk($result, 100) as $chunk) {
node_delete_multiple($chunk);
}

Related

Magento - API not being updated

The previous provider killed several kittens by changing the core many times of a Magento 1.6.2.0 distribution.
So, in order to solve an issue in record time, we had to screw and hit the kittens' corspses: we still maintain the modified API: catalog_category.assignProduct.
So far, we have this API for the method:
public function assignProduct($categoryId, $productId, $position = null, $identifierType = null)
{
$datos = explode('|', $categoryId);
$productId = array_shift($datos);
$categorias_actuales = Mage::getModel('catalog/category')
->getCollection()
->addIsActiveFilter()
->addIdFilter($datos)
->addLevelFilter(3)
->getAllIds();
$incat = function($cat) use ($categorias_actuales) {
return in_array($cat, $categorias_actuales);
};
$result = array_combine($datos, array_map($incat, $datos));
try {
$product = Mage::helper('catalog/product')->getProduct($productId, null, $identifierType);
$product->setCategoryIds($categorias_actuales);
$product->save();
} catch (Mage_Core_Exception $e) {
$this->_fault('data_invalid', $e->getMessage());
} catch (Exception $e) {
$this->_fault('internal_error', $e->getMessage());
}
return $result;
}
The intention of the API was to assign many categories at once for a product. However this is a modified version because the previous was prohibitively slow. And the curious fact is that this worked until recently. It's intention was to:
only the first parameter is used, and instead of an integer value, a kitten was killed here and the API now expects a string like "productId|catgId1|catgId2|cargId3|..." where the first value is a product id and the second to last values are categories ids; each of them being integer values. The parameter is broken by pipe character ("|") and the product id is popped. In this way, we have the product id in one var, and the categories array in another var.
get the categories collection keeping only the active ones in level 3 (i.e. depth for the category in the tree) whose ids are among the specified ones. This means: if provided categories are array(3, 5, 7, 8) (e.g. from a "245|3|5|7|8" parameter, being "245" the product id) and one of them does not exist or is not active (e.g. 7 is not active and 8 does not exist), the returned value in $categorias_actuales is [3, 5].
as for debugging porpuse, map each input category to existence and validity. this means: existent&&active&&level=3 categories will have "true" as value, while non-existent or non-active categories will have "false" as value. For the given example, the returned array would be: array(3 => true, 5 => true, 7 => false, 8 => false), since 7 is not active and 8 does not exist.
in the try-catch block: a. retrieve the product; b. set the filtered ids (in the example: [3, 5]) as product categories in the just-fetched product; c. save the product.
However, I have the following issues:
The function returns true, so returning $result would not give me the array $results as return value (my intention was to know in which categories was the product actually inserted).
Changing to false instead of $result in the return statement had no effect at all: the obtained/displayed value from the API call was still true.
Throwing a new Exception("something's going on here") had no effect at all. Still true at output.
dying (die('something's going on here')) neither had effect. Still seeing (guess what?) true at the output.
Edit: Also tried a syntax error!! (guess what? nothing happens).
It's not only that I tried these steps, but also refreshed the cache (System > Cache Management > select all and refresh, and even clicking the "Flush Magento Cache" button).
Questions:
1. given the issues before: how can I debug that API call?
2. what could be causing the product not being saved with their categories? By calling category.level I can see that the given ids (i.e. given in the first parameter to category.assignProduct) exist.
I'm a n00b # magento API and perhaps I'm missing something usually obvious. Any help would be appreciated.

Did you disable compilation? You can do it in one of these two ways.
System -> Tools -> Compilation in admin
Use SSH/shell and navigate to your magento root. Do the following commands:
cd shell
php -f compiler.php disable
Also, if your web server runs PHP APC with apc.stat set to 0 (you can check this by running php -i | grep apc.stat or looking for it at any phpinfo() output), please restart your apache2 web server or if you use php5-fpm -- restart that one, too.
Then, if all else fails, try looking for an equivalent file at app/code/local.
e.g. if your file is somewhere in app/code/core/Mage/Path/ToApiFile.php, look for the local version at app/code/local/Mage/Path/ToApiFile.php.
Finally, a very useful shell command for searching code instances:
$ cd app/code/
$ grep -rin "function assignProduct" *

Joomla Infinite Scrolling mysql pagination issues

I have set up infinite scrolling on a Joomla based website to load db results from mysql query. It works fine but when I have it set up to load 10 results at a time, it skips results 11-20 and then loads the rest of the values, and likewise when I set up to show 20 results it loads the first 40 without any repeats, and then proceeds to load 10 previouss results and 10 new ones for each new pagination result until it reaches the end of the list. Here is the code I have for pagination,
//
jimport('joomla.html.pagination');
// prepare the pagination values
$total = $this->xyz->getTotal('posts',' and cat_id = ' . $cat->cat_id);
$limit = $mainframe->getUserStateFromRequest('global.list.limit','limit', $mainframe->getCfg('list_limit'));
$limitstart = $mainframe->getUserStateFromRequest(JRequest::getVar('option').'limitstart','limitstart', 0);
$this->items = $this->xyz->categoryItems(JRequest::getInt('cat_id'),$limitstart,$limit);
// create the pagination object
$_pagination = new JPagination($total, $limitstart,$limit);
$_pagination_footer = $_pagination->getListFooter();
//
I should mention that I set the $limit value to 10 on line 7 of the code above to make it load 10 at a time. If it is left as $limit it loads 20 at a time.
Preferably I would like to load 50 at a time without any repeats or omissions but as it is now, I get plenty of repeats when set to 50. I found that setting it to 10 gives me the best results but still skips 11-20.
Any suggestions or thoughts would be greatly appreciated.

Had similar problems on two different occasions
1) SEF turned off
You might want to debug global.list.limit to check for consistency in values it loads
2) SEF turned on
Look for inconsistent entries in the redirection base for the same sef url.

How can I create a RSS feed with a limit statement in wordpress

I am basically creating an iphone app that get's it's data from wordpress. Wordpress will serve audio and video links via a RSS feed to the iphone app. I have the feed and audio player working great but can't seem to find anything related to how to create a custom feed where I can specify pagination like start=0&items=10. A plugin would be great but I can code something up in PHP if anyone has any ideas.

I'm going to answer this question by changing the standard RSS feed of a WordPress installation to respond to limits passed by query parameters. As you say you've already got a working feed, this should hopefully give you everything else you need.
By default, the standard feeds in WordPress are limited by the setting "Syndication feeds show the most recent X items" on the Settings→Reading page, and are unpaginated, as that wouldn't generally make sense for an RSS feed. This is controlled by WordPress's WP_Query::get_posts() method, in query.php, if you're interested in taking a look at how things work internally.
However, although the feed query's limit is set to LIMIT 0, X (where X is the above setting, 10 by default) , you can override the limit by filtering the query in the right place.
For example, the filter post_limits will filter the LIMIT clause of the query between the point it's set up by the default code for feeds and the time it's run. So, the following code in a plugin -- or even in your theme's functions.php -- will completely unlimit the items returned in your RSS feeds:
function custom_rss_limits($limits) {
if (is_feed()) {
// If this is a feed, drop the LIMIT clause completely
return "";
} else {
// It's not a feed; leave the normal LIMIT in place.
return $limits;
}
}
add_filter('post_limits', 'custom_rss_limits');
(At this point I should mention the obvious security implications -- if you've got 20,000 posts on your blog, you'll cause yourself a lot of server load and bandwidth if if lots of people start grabbing your feed, and you send out all 20,000 items to everyone. Therefore, bear in mind that whatever you end up doing, you may still want to enforce some hard limits, in case someone figures out your feed endpoint can be asked for everything, say by analysing traffic from your iPhone app.)
Now all we've got to do is to respond to query parameters. First of all, we register your two query parameters with WordPress:
function rss_limit_queryvars( $qv ) {
$qv[] = 'start';
$qv[] = 'items';
return $qv;
}
add_filter('query_vars', 'rss_limit_queryvars' );
That allows us to pass in the start and items variables you're suggesting for your URL parameters.
All we have to do then is to adjust our original LIMIT changing function to respond to them:
function custom_rss_limits($limits) {
if (is_feed()) {
global $wp_query;
if (isset($wp_query->query_vars['start']) &&
isset($wp_query->query_vars['items'])) {
// We're a feed, and we got pagination parameters. Override our
// standard limit.
// First convert to ints in case anyone's put something hinky
// in the query string.
$start = intval($wp_query->query_vars['start']);
$items = intval($wp_query->query_vars['items']);
$limits = "LIMIT $start, $items";
} else {
// We weren't passed pagination parameters, so just
// leave the default limits alone.
}
}
return $limits;
}
add_filter('post_limits', 'custom_rss_limits');
And there you go. Throw those last two blocks of code at WordPress, and you can now use a URL like this on any of your existing feeds:
http://example.com/feed/?start=30&items=25
For this example, you'll get the normal RSS feed, but with 25 items starting from item number 30.
...and if you don't pass the query parameters, everything will work like normal.

MySQL insert exceeding memory. Do I use a php script to send data in chunks?

INSERT INTO events (venue_id, artist_id, name, description)
SELECT e.id, e.artist_id, d.a_song, d.a_lyrics
FROM dump_sql AS d
INNER JOIN events AS e
ON d.a_album = e.name
Above is the mysql query I am using...works fine. Problem is that I have way too much data (150k records) that is too much it appears for the amount of the memory the server or mysql will allow.
I think at a minimum I need a php script to insert the data in chunks and perhaps increasing the memory allowance in the php, mysql and ???
Any and all help here would be most appreciated...I am a php newb and could use some help coming up with a script or any other pointers.
Thank you!
Error:
Node 0 DMA32 free:2776kB min:2788kB low:3484kB high:4180kB active_anon:211288kB inactive_anon:211276kB active_file:16kB inactive_file:0kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:500960kB mlocked:0kB dirty:0kB writeback:0kB mapped:116kB shmem:12kB slab_reclaimable:11372kB slab_unreclaimable:32752kB kernel_stack:904kB pagetables:10656kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:640 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 12*4kB 22*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
Node 0 DMA32: 676*4kB 12*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2864kB
5001 total pagecache pages
4940 pages in swap cache
Swap cache stats: add 1565880, delete 1560940, find 743932/825587
Free swap = 0kB
Total swap = 1044216kB
131071 pages RAM
5577 pages reserved
2405 pages shared
118768 pages non-shared
Out of memory: kill process 24373 (httpd) score 410236 or a child
Killed process 24373 (httpd) vsz:1640944kB, anon-rss:345220kB, file-rss:28kB

Try changing the default value of
max_allowed_packet
in my.ini.
Change it to something like:
max_allowed_packet = 100M
and see if that helps.

Need more speed in PHP foreach loop

I'm doing some integrations towards MS based web applications which forces me to fetch the data to my php application via SOAP which is fine.
I got the structure of a file system in an xml which I convert to an object. All documents have an ID and it's path. To be able to place the documents in a tree view I've built some methods to calculate the documents whereabouts through the files and folder structure. This works fine until I started to try with large file lists.
What I need is a faster method (or way to do things) than a foreach loop.
The method below is the troublemaker.
/**
* Find parent id based on path
* #param array $documents
* #param string $parentPath
* #return int
*/
private function getParentId($documents, $parentPath) {
$parentId = 0;
foreach ($documents as $document) {
if ($parentPath == $document->ServerUrl) {
$parentId = $document->ID;
break;
}
}
return $parentId;
}
// With 20 documents nested in different folders this method renders in 0.00033712387084961
// With 9000 documents nested in different folders it takes 60 seconds
The array sent to the object looks like this
Array
(
[0] => testprojectDocumentLibraryObject Object
(
[ParentID] => 0
[Level] => 1
[ParentPath] => /Shared Documents
[ID] => 163
[GUID] => 505d70ea-51d7-4ef0-bf79-8e912553249e
[DocIcon] =>
[FileType] =>
[Title] => Folder1
[BaseName] => Folder1
[LinkFilename] => Folder1
[ContentType] => Folder
[FileSizeDisplay] =>
[_UIVersionString] => 1.0
[ServerUrl] => /Shared Documents/Folder1
[EncodedAbsUrl] => http://dev1.example.com/Shared%20Documents/Folder1
[Created] => 2011-10-08 20:57:47
[Modified] => 2011-10-08 20:57:47
[ModifiedBy] =>
[CreatedBy] =>
[_ModerationStatus] => 0
[WorkflowVersion] => 1
)
...
A bit bigger example of the data array is available here
http://www.trikks.com/files/testprojectDocumentLibraryObject.txt
Thanks for any help!
=== UPDATE ===
To illustrate the time different stuff takes I've added this part.
Packet downloaded in 8.5031080245972 seconds
Packet decoded in 1.2838368415833 seconds
Packet unpacked in 0.051079988479614 seconds
List data organized in 3.8216209411621 seconds
Standard properties filled in 0.46236896514893 seconds
Custom properties filled in 40.856066942215 seconds
TOTAL: This page was created in 55.231353998184 seconds!
Now, this is a custom property action that im describing, the other stuff is already somewhat optimized. The data sent from the WCF service is compressed and encoded ratio 10:1 (like 10mb uncompressed : 1mb compressed).
The current priority is to optimize the custom properties part, where the getParentId method takes 99% of the execution time!

You may see faster results by using XMLReader or expat instead of simplexml. Both of these reqd the xml sequentially and won't store the entire document in memory.
Also make sure you have the APC extension on, for the actual loop it's a big big difference. Some benchmarks on the actual loop would be nice.
Lastly, if you cannot make it faster.. rather than trying to optimize reading the large xml document, you should look into ways where this 'slowness' is not an issue. Some ideas include an asynchronous process, proper caching, etc..
Edit
Are you actually calling getParentId for every document? This just occurred to me. If you have a 1000 documents then this would imply already 1000*1000 loops. If this is truly the case, you need to rewrite your code so it becomes a single loop.

How are you populating the array in the first place? Perhaps you could arrange the items in a hierarchy of nested arrays, where each key relates to one part of the path.
e.g.
['Shared Documents']
['Folder1']
['Yet another folder']
['folderA']
['folderB']
Then in your getParentId() method, extract the various parts of the path and just search that section of data:
private function getParentId($documents, $parentPath) {
$keys = explode('/', $parentPath);
$docs = $documents;
foreach ($keys as $key) {
if (isset($docs[$key])) {
$docs = $docs[$key];
} else {
return 0;
}
}
foreach $docs as $document) {
if ($parentPath == $document->ServerUrl) {
return $document->ID;
}
}
}
I haven't fully checked that will do what you're after, but it might help set you on a helpful path.
Edit: I missed that you're not populating the array yourself initially; but doing some sort of indexing ahead of time might still save you time overall, especially if getParentId is called on the same data multiple times.

As usual this was a matter of programming design. And there are a few lessons to be learned from this.
In a file system the parent is always a folder, to speed up such a process in php you can put all the folders in a separate array with it's corresponding ID as the key and search that array when you want to find the parent of a file, instead of searching the entire file structure array!
Packet downloaded in 6.9351849555969 seconds
Packet decoded in 1.2411289215088 seconds
Packet unpacked in 0.04874587059021 seconds
List data organized in 3.7993721961975 seconds
Standard properties filled in 0.4488160610199 seconds
Custom properties filled in 0.15889382362366 seconds
This page was created in 11.578738212585 seconds!
Compare the custom properties by the one from my original post
Cheers

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Error using node_load_multiple() API for Drupal 7 - php

Related

Magento - API not being updated

Joomla Infinite Scrolling mysql pagination issues

How can I create a RSS feed with a limit statement in wordpress

MySQL insert exceeding memory. Do I use a php script to send data in chunks?

Need more speed in PHP foreach loop

Categories

Resources