PHP using strpos() before str_replace() - php

I have a loop, which takes a large amount of text in each iteration and replaces specific placeholder ('token') with some other content like so:
$string = $pageContent;
foreach($categories as $row) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
The Plugin class and it's get() method simply takes the right file from a specific directory and outputs buffer as a string.
There might be a large number of categories therefore I wonder whether whether it would be better to first check the input string for an occurrence of the specific 'token' before going through populating all images from a given category using strpos() function like so:
foreach($categories as $row) {
if (strpos($string, $row['token']) !== false) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
}
My concern is the performance - would this help? - consider $string to potentially contain a large number of characters (TEXT field type in MySQL)?

To solve your problem
As per your example code it seems that the files used in Plugin::get() are small in size which means including them or reading them should not incur large performance costs, but if there are a lot of them you may need to consider those costs due to OS queuing mechanisms even if the data they contain is not big.
The getByCategory method should incur large performance costs because it implies many connect->query->read->close communication sequences to the database and each implies the transfer of a large amount of data (the TEXT fields you mentioned).
You should consider fetching the data as a batch operation with one single SQL query and storing it in a cache variable indexed by the row id so that getByCategory can fetch it from the cache.
Your current problem is not a matter of simple code review, it's a matter of approach. You have used a typical technique for small datasets as an approach to handling large datasets. The notion of "wrap a foreach over the simple script" works if you have medium datasets and don't feel a performance decay, if you don't you need a separate approach to handle the large dataset.
To answer your question
Using strpos means running through the entire haystack once to check if it contains the needle, and after that running through it again to do the replace with str_replace.
If the haystack does not contain the needle, strpos === str_replace (in the matter of computational complexity) because both of them have to run through the entire string to the end to make sure no needles are there.
Using both functions adds 100% more computational complexity for any haystack that does not contain the needle and increases the computational complexity anywhere from 1% to 100% more computational complexity for any haystack that does contain the needle because strpos will return right after the first needle found which can be found at the start of the string, the middle or the end.
In short don't use strpos it does not help you here, if you were using preg_replace the RegEx engine might have incurred more computational complexity than strpos for haystacks that do not contain the needle.

Thanks Mihai - that makes a lot of sense, however - in this particular scenario even if I get all of the records from the database first - meaning all of the images with associated categories - it would be rare that the $string would contain more than just one or two 'tokens' - meaning that using strpos() could actually save time if there were many categories ('tokens') to compare against.
Imagine we don't call the getByCategory in each iteration because we already store all possible records in earlier generated array - we still have to go through output buffering inside of the Plugin::get() method and str_replace() - meaning that if we have say 20 categories - this would occur 20 times without necessarily 'token' being included within the $string.
So your suggestion would work if there was suppose to be a large number of 'tokens' found in the $string comparing to the number of categories we are looping through, but for a small number of 'tokens' I think that strpos() would still be beneficial as that would be the only one executed for each category rather then two following when the strpos() returns true - in which case it's a small price to pay in the form of strpos() comparing to ob and str_replace together each time in the loop - don't you think?
I very much appreciate your explanation though.

I think it's better to benchmark stuff by yourself if you are looking for optimization (especially for micro-optimization). Any implementation has more that one variation (usually) so it's better to benchmark your used variation. According to this you can see the benchmark results here:
with strpos: http://3v4l.org/pb4hY#v533
without strpos: http://3v4l.org/v35gT

Related

Search and replace links by correspondence array

I want to search and replace links based on a correspondence array.
I wrote this solution but I find it a bit simplistic and maybe not efficient enough to handle 2000 pages and 15000 links. What do you think? Use DOMDocument or regex would be more effective? Thank you for your answers.
$correspondences = array(
"old/exercise-2017.aspx" => "/new/exercise2017.aspx",
"old/exercise-2016.aspx" => "/new/exercise2016.aspx",
"old/Pages/index.aspx" => "/new/en/previous-exercises/index.aspx"
);
$html = '<ul><li>Appraisal exercise 2017</li><li>Appraisal exercise 2016</li><li> Previous appraisal exercises</li></ul>';
foreach($correspondences as $key => $value) {
if(strpos($html, $key)) {
$html = str_replace($key, $value, $html);
}
}
echo $html;
?>
This approach is not the most efficient, but it should be fine as long as you do it only once and store the result. Given that you have already implemented it this way, you should just go with it unless you run into an actual performance problem.
If you are trying to do this at runtime (i.e. modify the page every single time it is served) then, yes, this is likely to be problematic. 15000 string searches per page is likely to be slow.
In that case, the most obvious change would be the one implied by this answer: do it once and save the result, instead of calculating it at run time.
If you must do it at runtime, then the optimal solution would use DOMDocument to get the URL. You could then replace it based on a set of rules if possible (e.g. if /old/Pages/ always gets translated to /new/en/previous-exercizes then implement logic for that). Or you could use a dictionary keyed to the old URL to get the new URL, if you must individually code each path.

How do I efficiently run a PHP script that doesn't take forever to execute in wamp enviornemnt...?

I've made a script that pretty much loads a huge array of objects from a mysql database, and then loads a huge (but smaller) list of objects from the same mysql database.
I want to iterate over each list to check for irregular behaviour, using PHP. BUT everytime I run the script it takes forever to execute (so far I haven't seen it complete). Is there any optimizations I can make so it doesn't take this long to execute...? There's roughly 64150 entries in the first list, and about 1748 entries in the second list.
This is what the code generally looks like in pseudo code.
// an array of size 64000 containing objects in the form of {"id": 1, "unique_id": "kqiweyu21a)_"}
$items_list = [];
// an array of size 5000 containing objects in the form of {"inventory: "a long string that might have the unique_id", "name": "SomeName", id": 1};
$user_list = [];
Up until this point the results are instant... But when I do this it takes forever to execute, seems like it never ends...
foreach($items_list as $item)
{
foreach($user_list as $user)
{
if(strpos($user["inventory"], $item["unique_id"]) !== false)
{
echo("Found a version of the item");
}
}
}
Note that the echo should rarely happen.... The issue isn't with MySQL as the $items_list and $user_list array populate almost instantly.. It only starts to take forever when I try to iterate over the lists...
With 130M iterations, adding a break will help somehow despite it rarely happens...
foreach($items_list as $item)
{
foreach($user_list as $user)
{
if(strpos($user["inventory"], $item["unique_id"])){
echo("Found a version of the item");
break;
}
}
}
alternate solutions 1 with PHP 5.6: You could also use PTHREADS and split your big array in chunks to pool them into threads... with break, this will certainly improve it.
alternate solutions 2: use PHP7, the performances improvements regarding arrays manipulations and loop is BIG.
Also try to sort you arrays before the loop. depends on what you are looking at but very oftenly, sorting arrays before will limit a much as possible the loop time if the condition is found.
Your example is almost impossible to reproduce. You need to provide an example that can be replicated ie the two loops as given if only accessing an array will complete extremely quickly ie 1 - 2 seconds. This means that either the string your searching is kilobytes or larger (not provided in question) or something else is happening ie a database access or something like that while the loops are running.
You can let SQL do the searching for you. Since you don't share the columns you need I'll only pull the ones I see.
SELECT i.unique_id, u.inventory
FROM items i, users u
WHERE LOCATE(i.unique_id, u inventory)

PHP Questions. Loops or If statement?

I am trying to learn PHP while I write a basic application. I want a process whereby old words get put into an array $oldWords = array(); so all $words, that have been used get inserted using array_push(oldWords, $words).
Every time the code is executed, I want a process that finds a new word from $wordList = array(...). However, I don't want to select any words that have already been used or are in $oldWords.
Right now I'm thinking about how I would go about this. I've been considering finding a new word via $wordChooser = rand (1, $totalWords); I've been thinking of using an if/else statement, but the problem is if array_search($word, $doneWords) finds a word, then I would need to renew the word and check it again.
This process seems extremely inefficient, and I'm considering a loop function but, which one, and what would be a good way to solve the issue?
Thanks
I'm a bit confused, PHP dies at the end of the execution of the script. However you are generating this array, could you also not at the same time generate what words haven't been used from word list? (The array_diff from all words to used words).
Or else, if there's another reason I'm missing, why can't you just use a loop and quickly find the first word in $wordList that's not in $oldWord in O(n)?
function generate_new_word() {
foreach ($wordList as $word) {
if (in_array($word, $oldWords)) {
return $word; //Word hasn't been used
}
}
return null; //All words have been used
}
Or, just do an array difference (less efficient though, since best case is it has to go through the entire array, while for the above it only has to go to the first word)
EDIT: For random
$newWordArray = array_diff($allWords, $oldWords); //List of all words in allWords that are not in oldWords
$randomNewWord = array_rand($newWordArray, 1);//Will get a new word, if there are any
Or unless you're interested in making your own datatype, the best case for this could possibly be in O(log(n))

What's the most efficient way to grab data from this particular file?

I'm trying to think of the most efficient way to parse a file that stores names, studentids and Facebook ids. I'm trying to get the fbid value, so for this particular line it would be: 1281766051. I thought about using regex for this, but I'm a bit lost as to where to start. I thought about adding all this data to an array and chopping away at it, but it just seems inefficient.
{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}
I apologise if the post is too brief. I'll do my best to add anything that I might have missed out. Thanks.
Well, this seems to be JSON, so the right way would be
$json = json_decode($str);
$id = $json->fbid;
The regex solution would look like this:
preg_match('/"fbid":"(\d+)"/', $str, $matches);
$id = $matches[1];
But I cannot tell you off the top of my head which of these is more efficient. You would have to profile it.
UPDATE:
I performed a very basic check on execution times (nothing too reliable, I just measured 1,000,000 executions of both codes). For your particular input, the difference is rather negligible:
json_decode: 27s
preg_match: 24s
However, if your JSON records get larger (for example, if I add 3 fields to the beginning of the string (so that both solutions are affected)), the difference becomes quite noticeable:
json_decode: 46s
preg_match: 30s
Now, if I add the three fields to the end of the string, the difference becomes even larger (obviously, because preg_match does not care about anything after the match):
json_decode: 45s
preg_match: 24s
Even so, before you apply optimizations like this, perform proper profiling of your application and make sure that this is actually a crucial bottleneck. If it is not, it's not worth obscuring your JSON-parsing code with regex functions.
That's JSON, use:
$str = '{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}';
$data = json_decode($str);
echo $data->fbid;
Cheers
Use json_decode
$txt='{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}';
$student =json_decode($txt);
echo $student->fbid;

Need assistance matching an exact string in PHP array

I have a text file that has item numbers in it (one per line). When an item is scanned by our barcode scanner it gets placed into this text file IF it exists in the order (which is stored in an array...item numbers only, nothing else).
What's happening is that if I have the two item numbers:
C0DB-9700-W
C0DB-9700-WP
If I scan the item C0DB-9700-W first then I can scan the second item just fine, but if I scan C0DB-9700-WP first, it thinks that I've already scanned C0DB-9700-W because that item is a prefix to the item I've already scanned.
I know that strpos only checks for the first occurrence. I was using the following code:
if (strpos($file_array, $submitted ) !==FALSE) {
I switched to using:
if (preg_match('/'.$submitted.'/', $file_array)) {
I thought that by using preg_match I could overcome the problem, but apparently not. I just want PHP to check the EXACT string I give it against items in the array (which I'm getting from the file) to see if it has already been scanned or not. This isn't that hard in my mind but obviously I'm missing something here. How can I coax PHP into looking for the entire string and not giving up when it finds something that will be good enough (or at least what it thinks is good enough)?
Thanks!
Just use in_array:
if (in_array($submitted, $file_array))
FYI, your regex was missing start/end anchors (and the second argument needs to be a string, not an array):
preg_match('/^'.$submitted.'$/', $subject)
There's nothing inexact about C0DB-9700-WP containing a match for C0DB-9700-W. What you're looking for is a regular expression that ensures the string you want is an entire word by itself:
if (preg_match('/\\b'.$submitted.'\\b/', $file_array)) {
For an array of items $file_array:
if (in_array($submitted, $file_array)) {
// Do something...
}
Although in your examples, it looks like your $file_array is a string, so you'd want to do:
$file_array = explode("\n", $file_array);

Categories