preg_replace with arrays

preg_replace with arrays - php

My database has a table with 1000 terms and their definitions.
I want to print those definitions and add a span tag to every word that is already a term.
I use this to create the two arrays (patterns and replacements):
while($row = mysql_fetch_array($rsd)){
$patterns[$i] = '/'.$row['term'].'/';
$patterns[$i+1] = '/<span class="linkedterm"><span class="linkedterm">'.$row['term'].'</span>/';
$replacements[$i+1] = '<span class="linkedterm">'.$row['term'].'</span>';
$replacements[$i] = '<span class="linkedterm">'.$row['term'];
$i = $i + 2;
}
And this to echo the definitions:
echo preg_replace($patterns, $replacements, $row['definition']);
With this code i have an error for character /, at the close span tag. So I want a solution for this, to be able to pass a value with / char. Or any other solution that I may have missed.
Thanks

You might want to look at preg_quote
Quote regular expression characters

The / is also your delimiter character (to point out the start and end of your regex). So if you want to search for a literal /, make sure you escape it with a backslash, like so:
$patterns[$i+1] = '/<span class="linkedterm"><span class="linkedterm">'.$row['term'].'<\/span>/';

$patterns[$i] = '/'.preg_quote($row['term']).'/';
$patterns[$i+1] = '/'.preg_quote('<span class="linkedterm" ><span class="linkedterm" >'.$row['term'].'</span>', '/').'/';

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,

If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase

First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

PHP str_replace scraped content with wild card?

I'm looking for a solution to strip some HTML from a scraped HTML page. The page has some repetitive data I would like to delete so I tried with preg_replace() to delete the variable data.
Data I want to strip:
Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2
....
...
Must be like this afterwards:
Producent:Example
Groep:Example1
Type:Example2
So a big piece is the same except the word within the data-title piece. How could I delete this piece of data?
I tried a few things like this one:
$pattern = '/<td class=\"datatable__body__item\"(.*?)>/';
$tech_specs = str_replace($pattern,"", $tech_specs);
But that didn't work. Is there any solution to this?

Just use a wildcard:
$newstr = preg_replace('/<td class="datatable__body__item" data-title=".*?">/', '', $str);
.*? means match anything but don't be greedy

Assuming that the string looked like this:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example';
You could get the beginning and the end of the string with this:
preg_match('/^(\w+:).*\>(\w+)/', $string, $matches);
echo implode([$matches[1], $matches[2]]);
Which, in this case, will throw Producent:Example. So, then you could add this output to another variable/array you intend to use.
OR, since you mentioned replacing:
$string = preg_replace('/^(\w+:).*\>(\w+)/', '$1$2', $string);
But then again, checking as it would probably come in a variable number of lines:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2';
$stringRows = explode(PHP_EOL, $string);
$pattern = '/^(\w+:).*\>(\w+)/';
$replacement = '$1$2';
foreach ($stringRows as &$stringRow) {
$stringRow = preg_replace($pattern, $replacement, $stringRow);
}
$string = implode(PHP_EOL, $stringRows);
Which will then output the string like you expect.
Explaining my regex:
the first group catches the first word until the two dots :, then another group to catch the last word. I had previously specified anchors for both ends, but when breaking each line this wouldn't work as expected, so I kept only the beginning.
^(\w+:) => the word in the beginning of the string until two dots appear
.*\> => everything else until smaller symbol appears (escaped by slash)
(\w+) => the word after the smaller than symbol

Well maybe my question wasn't that good written. I had a table which I needed to scrape from a website. I needed the info in the table, but had to cleanup some parts as mentioned. The solution I finally made was this one and it works. It still has a little work to do with manual replacements but that is because of the stupid " they use for inch. ;-)
Solution:
\\ find the table in the sourcecode
foreach($techdata->find('table') as $table){
\\ filter out the rows
foreach($table->find('tr') as $row){
\\ take the innertext using simplehtmldom
$tech_specs = $row->innertext;
\\ strip some 'garbage'
$tech_specs = str_replace(" \t\t\t\t\t\t\t\t\t\t\t<td class=\"datatable__body__item\">","", $tech_specs);
\\ find the first word of the string so I can use it
$spec1 = explode('</td>', $tech_specs)[0];
\\ use the found string to strip down the rest of the table
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"" . $spec1 . "\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"tbv Montage benodigde 19\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"19\">",":", $tech_specs);
\\ strip some 'garbage'
$tech_specs = str_replace("\t\t\t\t\t\t\t\t\t\t","\n", $tech_specs);
$tech_specs = str_replace("</td>","", $tech_specs);
$tech_specs = str_replace(" ","", $tech_specs);
\\ put the clean row in an array ready for usage
$specs[] = $tech_specs;
}
}

preg_replace does not replace the value as required

Assume we have a php array $row_mid, which contains strings like 'reaction_l0', 'reaction_l1', 'reaction_r0', 'reaction_r1' (in each case the number goes from 0 to 4. These strings are enclosed by <div> tags. I want to run a loop and remove these strings with preg_replace ():
$i = 0;
while ($i < count ($row_mid)){
$row_mid [$i] = preg_replace ("~^reaction_.[0-9]$~", "", $row_mid [$i]);
$i++;
}
The regexp ^reaction_.[0-9]$ was developed with the help of https://regex101.com/ and tested successfully with strings <div>reaction_r1</div> (no match, I need the tags stay where they are) and reaction_r1 (match). It doesn't work, however.

Get rid of the anchors, because they only allow the regexp to match the entire string, not when it's enclosed in tags.
$row_mid [$i] = preg_replace ("~reaction_.[0-9]~", "", $row_mid [$i]);

Just remove the two symbol ^ and $

Stop regex splitting on whitespace

I'm writing a parser, trying to automate a way that I can pass any argument as a param like follows:
$content = '{loop for=products showPagination="true" paginationPosition="both" wrapLoop="true" returnDefaultNoResults="true" noResultsHeading="Nothing Found" noResultsHeadingSize="2" noResultsParagraph="We have not found any products in this category, please try another."}{/loop}';
preg_match_all('/([a-zA-Z]+)=([\/\.\"a-zA-Z0-9&;,_-]+)/', str_replace('"', '"', $content), $attr);
if (!is_array($attr)) return array();
for ($z = 0; $z < count($attr[1]); $z++) if (isset($attr['1'][$z])) $attrs[$attr['1'][$z]] = trim($attr['2'][$z], '"');
echo json_encode($attrs);
My Isssue is that my loop & regex is splitting out whitespace and I can't figure out how to alter it so that it doesn't.
I've tried adding \w into the right hand side of the = sign, but no luck.
RESULT
{"for":"products","showPagination":"true","paginationPosition":"both","wrapLoop":"true","returnDefaultNoResults":"true","noResultsHeading":"Nothing","noResultsHeadingSize":"2","noResultsParagraph":"We"}
You'll notice that the last two params both stop after the first word.

I suggest you to change the preg_match_all function like below.
preg_match_all('/([a-zA-Z]+)=("[^"]*"|\S+)/', str_replace('"', '"', $content), $attr);
It will greedily matches all the double quoted contents first. If there isn't any double quotes block, then it will match one or more non-space characters.
Output:
{"for":"products","showPagination":"true","paginationPosition":"both","wrapLoop":"true","returnDefaultNoResults":"true","noResultsHeading":"Nothing Found","noResultsHeadingSize":"2","noResultsParagraph":"We have not found any products in this category, please try another."}

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!

I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.

This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not

Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.

Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);

Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];

This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);

$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace with arrays - php

You might want to look at preg_quote Quote regular expression characters

The / is also your delimiter character (to point out the start and end of your regex). So if you want to search for a literal /, make sure you escape it with a backslash, like so: $patterns[$i+1] = '/<span class="linkedterm"><span class="linkedterm">'.$row['term'].'<\/span>/';

$patterns[$i] = '/'.preg_quote($row['term']).'/'; $patterns[$i+1] = '/'.preg_quote('<span class="linkedterm" ><span class="linkedterm" >'.$row['term'].'</span>', '/').'/';

Related

Create a function to find a specific word in the title

PHP str_replace scraped content with wild card?

preg_replace does not replace the value as required

Stop regex splitting on whitespace

Regular Expressions: how to do "option split" replaces

Categories

Resources