Large text replace array - php

I'm looking for some help when replacing text from when i'm importing an XML file. I want to text-replace some values when importing, so it matches my categories, filter values etc. on my website.
I'm using this function. i wrote it myself with copy-pasting from internet (i'm not a coder) but now i need some help/advice.
<?php
// Text replace test function
function my_text_replace($x) {
for ($y = 0; $y < 2; $y = $y+1) {
$phrase = $x;
$old = array("Draaideurkast", "fout1 MRC", "Draaideurkast MRC", "Draaideurkast MRC");
$new = array("fout1", "fout2", "goed", "fout3");
$x = str_ireplace($old, $new, $phrase);
$y = $y+1;
return $x;
}
}
?>
Code Fix:
What happens is that i do not want a partial match replace, but only the complete value of $x. in the example the output should be 'goed'. it only should replace once when found. (but that is fixed with the for loop i think). the output should be case insensitive.
Advice question:
is this a correct way of replace (large amounts) of texts during an import? you guys know other best practises or plugins (wordpress) or tools..
Thanks for any response!
Harm

Related

Simple PHP code for extracting data from the HTML source code

I know I can use xpath, but in this case it wouldn't work because of the complexity of the navigation of the site.
I can only use the source code.
I have browsed all over the place and couldn't find a simple php solution that would:
Open the HTML source code page (I already have an exact source code page URL).
Select and extract the text between two codes. Not between a div. But I know the start and end variables.
So, basically, I need to extract the text between
knownhtmlcodestart> Text to extract <knownhtmlcodeend
What I'm trying to achieve in the end is this:
Go to a source code URL.
Extract the text between two codes.
Store the data temporarily (define the time manually for how long) on my web server in a simple text file.
Define the waiting time and then repeat the whole process again.
The website that I'm going to extract data from is changing dynamically. So it would always store new data into the same file.
Then I would use that data (but that's a question for another time).
I would appreciate it if anyone could lead me to a simple solution.
Not asking to write a code, but maybe someone did anything similar and sharing the code here would be helpful.
Thanks
I (shamefully) found the following function useful to extract stuff from HTML. Regexes sometimes are too complex to extract large stuff, e.g. a whole <table>
/*
$start - string marking the start of the sequence you want to extract
$end - string marking the end of it..
$offset - starting position in case you need to find multiple occurrences
returns the string between `$start` and `$end`, and the indexes of start and end
*/
function strExt($str, $start, $end = null, $offset = 0)
{
$p1 = mb_strpos($str,$start,$offset);
if ($p1 === false) return false;
$p1 += mb_strlen($start);
$p2 = $end === null ? mb_strlen($str) : mb_strpos($str,$end, $p1+1);
return
[
'str' => mb_substr($str, $p1, $p2-$p1),
'start' => $p1,
'end' => $p2];
}
This would assume the opening and closing tag are on the same line (as in your example). If the tags can be on separate lines, it wouldn't be difficult to adapt this.
$html = file_get_contents('website.com');
$lines = explode("\n", $html);
foreach($lines as $word) {
$t1 = strpos($word, "knownhtmlcodestart");
$t2 = strpos($word, "knownhtmlcodeend");
if ($t1)
$c1 = $t1;
if ($t2)
$c2 = $t2;
if ($c1 && $c2){
$text = substring($word, $c1, $c2-$c1);
break;
}
}
echo $text;

Is it possible to use Knuth-Morris-Pratt Algorithm for string matching on text to text?

I have a KMP code in PHP which is can do string matching between word to text. I wonder if i can use KMP Algorithm for string matching between text to text. Is it possible or not? and how can i use it for finding the matching of the string between 2 text.
Here's the core of KMP algorithm :
<?php
class KMP{
function KMPSearch($p,$t){
$result = array();
$pattern = str_split($p);
$text = str_split($t);
$prefix = $this->preKMP($pattern);
// print_r($prefix);
// KMP String Matching
$i = $j = 0;
$num=0;
while($j<count($text)){
while($i>-1 && $pattern[$i]!=$text[$j]){
// if it doesn't match, then uses then look at the prefix table
$i = $prefix[$i];
}
$i++;
$j++;
if($i>=count($pattern)){
// if its match, find the matches string potition
// Then use prefix table to swipe to the right.
$result[$num++]=$j-count($pattern);
$i = $prefix[$i];
}
}
return $result;
}
// Making Prefix table with preKMP function
function preKMP($pattern){
$i = 0;
$j = $prefix[0] = -1;
while($i<count($pattern)){
while($j>-1 && $pattern[$i]!=$pattern[$j]){
$j = $prefix[$j];
}
$i++;
$j++;
if(isset($pattern[$i])==isset($pattern[$j])){
$prefix[$i]=$prefix[$j];
}else{
$prefix[$i]=$j;
}
}
return $prefix;
}
}
?>
I calling this class to my index.php if i want to use to find word on the text.
This is the step that i want my code do :
(1). I input a text 1
(2). I input a text 2
(3). I want a text 1 become a pattern (every single word is in text 1 treat as pattern)
(4). I want my code can find every pattern on text 1 in text 2
(5). Last, my code can show me what the percentage of similarity.
Hope you all can help me or teach me. I've been serching for the answer everywhere but can't find it yet. At least you can teach me.
If you just need to find all words that are present in both texts, you don't any string search algorithm to do it. You can just add all words from the first text to a hash table, iterate over the second text and add the words that are in a hash table to the output list.
You can use a trie instead of a hash table if you want a linear time complexity in the worst case, but I'd get started with a hash table because it's easy to use and is likely to be good enough for practical purposes.

What is the fastest way to check amount of specific chars in a string in PHP?

So i need to check if amount of chars from specific set in a string is higher than some number, what a fastest way to do that?
For example i have a long string "some text & some text & some text + a lot more + a lot more ... etc." and i need to check if there r more than 3 of next symbols: [&,.,+]. So when i encounter 4th occurrence of one of these chars i just need to return false, and stop the loop. So i think to create a simple function like that. But i wonder is there any native method in php to do such a thing? But i need some function which will not waste time parsing the string till the end, cuz the string may be pretty long. So i think regexp and functions like count_chars r not suited for that kind of job...
Any suggestions?
I don't know about a native method, I think count_chars is probably as close as you're going to get. However, rolling a custom solution would be relatively simple:
$str = 'your text here';
$chars = ['&', '.', '+'];
$count = [];
$length = strlen($str);
$limit = 3;
for ($i = 0; $i < $length; $i++) {
if (in_array($str[$i], $chars)) {
$count[$str[$i]] += 1;
if ($count[$str[$i]] > $limit) {
break;
}
}
}
Where the data is actually coming from might also make a difference. For example, if it's from a file then you could take advantage of fread's 2nd parameter to only read x number of bytes at a time within a while loop.
Finding the fastest way might be too broad of a question as PHP has a lot of string related functions; other solutions might use strstr, strpos, etc...
Not benchmarked the other solutions but http://php.net/manual/en/function.str-replace.php passing an array of options will be fast. There is an optional parameter which returns the count of replacements. Check that number
str_replace ( ['&','.','+'], '' , $subject , $count )
if ($count > $number ) {
Well, all my thoughts were wrong and my expectations were crushed by real tests. RegExp seems to work from 2 to 7 times faster (with different strings) than self-made function with simple symbol-checking loop.
The code:
// self-made function:
function chk_occurs($str,$chrs,$limit){
$r=false;
$count = 0;
$length = strlen($str);
for($i=0; $i<$length; $i++){
if(in_array($str[$i], $chrs)){
$count++;
if($count>$limit){
$r=true;
break;
}
}
}
return $r;
}
// RegExp i've used for tests:
preg_match('/([&\\.\\+]|[&\\.\\+][^&\\.\\+]+?){3,}?/',$str);
Of course it works faster because it's a single call to native function, but even same code wrapped into function works from 2 to ~4.8 times faster.
//RegExp wrapped into the function:
function chk_occurs_preg($str,$chrs,$limit){
$chrs=preg_quote($chrs);
return preg_match('/(['.$chrs.']|['.$chrs.'][^'.$chrs.']+?){'.$limit.',}?/',$str);
}
P.S. i wasn't bothered to check cpu-time, just was testing walltime measured via microtime(true); of the 200k iteration loop, but it's enough for me.

Insert string to PDF with For loop

I'm a junior PHP developer.
For a costumer I need to place some strings into a pdf. I'm using FPDI and I like it.
I have an existing template PDF and I need to insert every characters of a string into a little graphic box (see image).
Every characters must have 2 millimeters (8px approximately) from each others.
Every strings can have different length, so I thought do like this:
$name = 'namenamename';
$stringcount = strlen($name)-1;
$countspace = $stringcount*2;
//121 = coordinate of first box
for ($x=121; $x <= $x+$countspace; $x = $x+2) {
for ($i=0; $i <= $stringcount; $i++) {
$pdf->SetXY($x, 37);
$pdf->Write(0,$name[$i]);
}
}
That doesn't work. This is the error:
Maximum execution time of 30 seconds
Can you help me please with the correct approach and with good explanation for a newbie? :)
Try this code:
$name = 'namenamename';
$string_length = strlen($name);
$coordinate = 121; //Give to the variable coordinate the beginning value, in this case 121
for ($i=0; $i < $string_length; $i++){ //make only one loop for the string length so the loop ends when there is no more characters
$char = substr($name,$i,1); // this is "the tricky part", with substr you can grab each character with its position in the string
$pdf -> SetXY($coordinate, 37); // here you put the coordinate for the character
$pdf -> Write(0, $char); // write it
$coordinate += 2; // and increment it by two, since the character are two spaces away from each other
}
hope that will help..
Maybe not a great solution but you can modify the execution time with this line of code
set_time_limit ( $seconds );
Anyway give it a try but i think that is more an error in the logic of the loop maybe.
Can you say exactly the coordinate where you need the two first characters, the first is 121 + something or 121?

Regex to select div from source

A client of mine has asked for me to create a simple site that monitors files on another site. He needs to monitor the file names (unsure why?) and have them outputted to a file.
Here's the example source; http://pastebin.com/tyLUmCJr
I don't speak Russian, so I'm unaware of what the site's about. I apologize if it's anything that's 'less-than-suitable'.
Anyway, if you scroll to line 117, you will see a file name. I need to get all of the file names.
I've played around with the DOMDocument and third-party tools although I believe I could use regex to increase the speed of this. If anybody could point me in the correct direction, it would be greatly appreciated.
Note: take in mind that the source is stored within a string-variable known as $content.
Cheers!
After some more detailed, extensive research, I found a way to do it. Here's how I achieved it;
<?php
require_once("phpQuery.php");
$min = isset($_GET['min']) ? $_GET['min'] : 1;
$max = isset($_GET['max']) ? $_GET['max'] : 2;
$pages = [];
foreach(range($min, $max) as $page) {
array_push($pages, iconv("CP1251", "UTF-8", file_get_contents("http://www.fayloobmennik.net/files/list/" . $page . ".html")));
}
$html = file_get_html("http://www.fayloobmennik.net/files/list/");
$elem = $html->find('div[id=info] table > tbody', 0);
$test = $elem->find('tr a');
foreach ($test as $test2) {
$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';
$test2 = preg_match($regex, $test2, $match);
print_r(iconv("CP1251", "UTF-8", $match[2]));
echo "<br/>";
}
?>
The phpQuery.php class is simple_html_dom (I believe that's what it's called?).
Cheers.

Categories