I want to search and replace links based on a correspondence array.
I wrote this solution but I find it a bit simplistic and maybe not efficient enough to handle 2000 pages and 15000 links. What do you think? Use DOMDocument or regex would be more effective? Thank you for your answers.
$correspondences = array(
"old/exercise-2017.aspx" => "/new/exercise2017.aspx",
"old/exercise-2016.aspx" => "/new/exercise2016.aspx",
"old/Pages/index.aspx" => "/new/en/previous-exercises/index.aspx"
);
$html = '<ul><li>Appraisal exercise 2017</li><li>Appraisal exercise 2016</li><li> Previous appraisal exercises</li></ul>';
foreach($correspondences as $key => $value) {
if(strpos($html, $key)) {
$html = str_replace($key, $value, $html);
}
}
echo $html;
?>
This approach is not the most efficient, but it should be fine as long as you do it only once and store the result. Given that you have already implemented it this way, you should just go with it unless you run into an actual performance problem.
If you are trying to do this at runtime (i.e. modify the page every single time it is served) then, yes, this is likely to be problematic. 15000 string searches per page is likely to be slow.
In that case, the most obvious change would be the one implied by this answer: do it once and save the result, instead of calculating it at run time.
If you must do it at runtime, then the optimal solution would use DOMDocument to get the URL. You could then replace it based on a set of rules if possible (e.g. if /old/Pages/ always gets translated to /new/en/previous-exercizes then implement logic for that). Or you could use a dictionary keyed to the old URL to get the new URL, if you must individually code each path.
Related
I am trying to learn PHP while I write a basic application. I want a process whereby old words get put into an array $oldWords = array(); so all $words, that have been used get inserted using array_push(oldWords, $words).
Every time the code is executed, I want a process that finds a new word from $wordList = array(...). However, I don't want to select any words that have already been used or are in $oldWords.
Right now I'm thinking about how I would go about this. I've been considering finding a new word via $wordChooser = rand (1, $totalWords); I've been thinking of using an if/else statement, but the problem is if array_search($word, $doneWords) finds a word, then I would need to renew the word and check it again.
This process seems extremely inefficient, and I'm considering a loop function but, which one, and what would be a good way to solve the issue?
Thanks
I'm a bit confused, PHP dies at the end of the execution of the script. However you are generating this array, could you also not at the same time generate what words haven't been used from word list? (The array_diff from all words to used words).
Or else, if there's another reason I'm missing, why can't you just use a loop and quickly find the first word in $wordList that's not in $oldWord in O(n)?
function generate_new_word() {
foreach ($wordList as $word) {
if (in_array($word, $oldWords)) {
return $word; //Word hasn't been used
}
}
return null; //All words have been used
}
Or, just do an array difference (less efficient though, since best case is it has to go through the entire array, while for the above it only has to go to the first word)
EDIT: For random
$newWordArray = array_diff($allWords, $oldWords); //List of all words in allWords that are not in oldWords
$randomNewWord = array_rand($newWordArray, 1);//Will get a new word, if there are any
Or unless you're interested in making your own datatype, the best case for this could possibly be in O(log(n))
I'm trying to think of the most efficient way to parse a file that stores names, studentids and Facebook ids. I'm trying to get the fbid value, so for this particular line it would be: 1281766051. I thought about using regex for this, but I'm a bit lost as to where to start. I thought about adding all this data to an array and chopping away at it, but it just seems inefficient.
{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}
I apologise if the post is too brief. I'll do my best to add anything that I might have missed out. Thanks.
Well, this seems to be JSON, so the right way would be
$json = json_decode($str);
$id = $json->fbid;
The regex solution would look like this:
preg_match('/"fbid":"(\d+)"/', $str, $matches);
$id = $matches[1];
But I cannot tell you off the top of my head which of these is more efficient. You would have to profile it.
UPDATE:
I performed a very basic check on execution times (nothing too reliable, I just measured 1,000,000 executions of both codes). For your particular input, the difference is rather negligible:
json_decode: 27s
preg_match: 24s
However, if your JSON records get larger (for example, if I add 3 fields to the beginning of the string (so that both solutions are affected)), the difference becomes quite noticeable:
json_decode: 46s
preg_match: 30s
Now, if I add the three fields to the end of the string, the difference becomes even larger (obviously, because preg_match does not care about anything after the match):
json_decode: 45s
preg_match: 24s
Even so, before you apply optimizations like this, perform proper profiling of your application and make sure that this is actually a crucial bottleneck. If it is not, it's not worth obscuring your JSON-parsing code with regex functions.
That's JSON, use:
$str = '{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}';
$data = json_decode($str);
echo $data->fbid;
Cheers
Use json_decode
$txt='{"name":"John Smith","studentid":"10358595","fbid":"1284556651"}';
$student =json_decode($txt);
echo $student->fbid;
I have a loop, which takes a large amount of text in each iteration and replaces specific placeholder ('token') with some other content like so:
$string = $pageContent;
foreach($categories as $row) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
The Plugin class and it's get() method simply takes the right file from a specific directory and outputs buffer as a string.
There might be a large number of categories therefore I wonder whether whether it would be better to first check the input string for an occurrence of the specific 'token' before going through populating all images from a given category using strpos() function like so:
foreach($categories as $row) {
if (strpos($string, $row['token']) !== false) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
}
My concern is the performance - would this help? - consider $string to potentially contain a large number of characters (TEXT field type in MySQL)?
To solve your problem
As per your example code it seems that the files used in Plugin::get() are small in size which means including them or reading them should not incur large performance costs, but if there are a lot of them you may need to consider those costs due to OS queuing mechanisms even if the data they contain is not big.
The getByCategory method should incur large performance costs because it implies many connect->query->read->close communication sequences to the database and each implies the transfer of a large amount of data (the TEXT fields you mentioned).
You should consider fetching the data as a batch operation with one single SQL query and storing it in a cache variable indexed by the row id so that getByCategory can fetch it from the cache.
Your current problem is not a matter of simple code review, it's a matter of approach. You have used a typical technique for small datasets as an approach to handling large datasets. The notion of "wrap a foreach over the simple script" works if you have medium datasets and don't feel a performance decay, if you don't you need a separate approach to handle the large dataset.
To answer your question
Using strpos means running through the entire haystack once to check if it contains the needle, and after that running through it again to do the replace with str_replace.
If the haystack does not contain the needle, strpos === str_replace (in the matter of computational complexity) because both of them have to run through the entire string to the end to make sure no needles are there.
Using both functions adds 100% more computational complexity for any haystack that does not contain the needle and increases the computational complexity anywhere from 1% to 100% more computational complexity for any haystack that does contain the needle because strpos will return right after the first needle found which can be found at the start of the string, the middle or the end.
In short don't use strpos it does not help you here, if you were using preg_replace the RegEx engine might have incurred more computational complexity than strpos for haystacks that do not contain the needle.
Thanks Mihai - that makes a lot of sense, however - in this particular scenario even if I get all of the records from the database first - meaning all of the images with associated categories - it would be rare that the $string would contain more than just one or two 'tokens' - meaning that using strpos() could actually save time if there were many categories ('tokens') to compare against.
Imagine we don't call the getByCategory in each iteration because we already store all possible records in earlier generated array - we still have to go through output buffering inside of the Plugin::get() method and str_replace() - meaning that if we have say 20 categories - this would occur 20 times without necessarily 'token' being included within the $string.
So your suggestion would work if there was suppose to be a large number of 'tokens' found in the $string comparing to the number of categories we are looping through, but for a small number of 'tokens' I think that strpos() would still be beneficial as that would be the only one executed for each category rather then two following when the strpos() returns true - in which case it's a small price to pay in the form of strpos() comparing to ob and str_replace together each time in the loop - don't you think?
I very much appreciate your explanation though.
I think it's better to benchmark stuff by yourself if you are looking for optimization (especially for micro-optimization). Any implementation has more that one variation (usually) so it's better to benchmark your used variation. According to this you can see the benchmark results here:
with strpos: http://3v4l.org/pb4hY#v533
without strpos: http://3v4l.org/v35gT
I have a set of 4 HTML list items and I'd like to shuffle the order they appear in once a week. I was wondering if anyone out there had a nice, elegant solution to this?
As always, I'd be enormously grateful for any input you might have!
UPDATE:
Unfortunately, even with the necessary .htaccess overrides, I just can't get any srand() based solutions to work on this particular server sadly, but have the following which could be used instead - at the moment, it only returns one list item - how could I modify it to show the four required? Once again, any ideas would be gratefully received :)
function RandomList($TimeBase, $QuotesArray){
$TimeBase = intval($TimeBase);
$ItemCount = count($QuotesArray);
$RandomIndexPos = ($TimeBase % $ItemCount);
return $QuotesArray[$RandomIndexPos];
}
$WeekOfTheYear = date('W');
$RandomItems = array(
"<li>North</li>","<li>South</li>","<li>West</li>","<li>East</li>");
print RandomList($WeekOfTheYear, $RandomItems);
Here is a simple and - I guess - pretty elegant solution, which does not involve storing values in a database, setting up cronjobs and other boring stuff of the like.
Let's pretend you have your list elements in $array:
srand(date('W'));
shuffle($array);
srand();
Now your array is shuffled, and will be shuffled the same way until next Monday.
This has a problem, though: it does not work with the Suhosin patch (installed by default in Debian). Still, now that you know about date('W') it will be easy to come up with an alternate solution yourself.
EDIT: if you don't want to implement your own pseudorandom number generator but you have Suhosin installed, you can put the following line in your .htaccess:
php_value suhosin.srand.ignore 0
I have any array
$num_list = array(42=>'0',44=>'0',46=>'0',48=>'0',50=>'0',52=>'0',54=>'0',56=>'0',58=>'0',60=>'0');
and I want to change specific values as I go through a loop
while(list($pq, $oin) = mysql_fetch_row($result2)) {
$num_list[$oin] = $pq;
}
So I want to change like 58 to 403 rather then 0.
However I always end up getting just the last change and non of the earlier ones. So it always ends up being something like
0,0,0,0,0,0,0,0,0,403
rather then
14,19,0,24,603,249,0,0,0,403
How can I do this so it doesn't overwrite it?
Thanks
Well, you explicititly coded that each entry should be replaced with the values from the database (even with "0").
You could replace the values on non-zero-values only:
while(list($pq, $oin) = mysql_fetch_row($result2)) {
if ($pq !== "0") $num_list[$oin] = $pq;
}
I don't get you more clear, i thought your asking this only. Check this
while(list($pq, $oin) = mysql_fetch_row($result2)) {
if($oin==58) {
$num_list[$oin] = $pq;
}
}
In my simulated tests (although You are very scarce with information), Your code works well and produces the result that You want. Check the second query parameter, that You put into array - namely $pg, thats what You should get there 0,0,0,0,0...403 OR Other thing might be that Your $oin numbers are not present in $num_list keys.
I tested Your code with mysqli driver though, but resource extraction fetch_row is the same.
Bear in mind one more thing - if Your query record number is bigger than $numlist array, and $oin numbers are not unique, Your $numlist may be easily overwritten by the folowing data, also $numlist may get a lot more additional unwanted elements.
Always try to provide the wider context of Your problem, there could be many ways to solve that and help would arrive sooner.