Multiple answer with preg_replace - php

I have a problem with preg_replace in PHP.
My text:
[Derp] a
• [Derp] a
My regex:
$simple_search[0] = '/\[(.*?)\] (.*?)/is';
$simple_search[1] = '/\• \[(.*?)\] (.*?)/is';
My subject:
$simple_replace[0] = "[color=#009D9D][$1][/color] $2";
$simple_replace[1] = "[color=#30BA76]• [$1][/color] [color=#92CF91]$2[/color]";
After preg_replace:
[color=#009D9D][Derp][/color] a
[color=#30BA76]#color=#009D9D][Derp][/color[/color] [color=#606090]: [/color]a
(it's a tool for coloring quotes)
[Derp] a and
• [Derp] a must not have the same color.
The problem is that the first search then replaces that this is not the right thing.
How can I detect that research is equal to the string?

replace your first regexp:
/(?<!\• )\[(.*?)\] (.*?)/is
means can not have front of the "[" an "•" and a space. Also if the • stands in beginning of your lines then you could put ^ front of it

$str = '[Derp] a
• [Derp] a';
$simple_search[0] = '/(\• )?(?P<m2>\[.*?\]) (?P<m3>.*)/i';
echo $str = preg_replace_callback($simple_search[0],
function ($m) {
if (!$m[1]) return '[color=#009D9D]' . $m[2] . '[/color] ' . $m[3];
else return '[color=#30BA76]• ' . $m[2] . '[/color] [color=#92CF91]' . $m[3] . '[/color]';
}, $str
);
result
[color=#009D9D][Derp][/color] a
[color=#30BA76]• [Derp][/color] [color=#92CF91]a[/color]

Related

Why does PHP mb_convert_case() and mb_strtoupper() convert µ (U+00B5 MICRO SIGN) to "Μ"?

I'm trying to write my own mb_ucwords() function to proivde a quick wrapper of mb_convert_case so that it would work with multibyte strings since the base ucwords() function does not.
I have ran into an issue where a string passed in that starts with the µ character (U+00B5 MICRO SIGN) was coming back as "Μ" (U+039C GREEK CAPITAL LETTER MU) instead of being ignored as I would assume should happen.
I wrote a quick test script to verify some information:
function testUtf8($letter) {
echo "CHAR: " . $letter . "\n";
echo "Detected Encoding: " . mb_detect_encoding($letter) . "\n";
echo "IS VALID UTF-8? " . (mb_check_encoding($letter, 'UTF-8') ? 'YES' : 'NO') . "\n";
$lower = mb_strtolower($letter);
$upper = mb_strtoupper($letter);
$conv = mb_convert_case($letter, MB_CASE_TITLE, 'UTF-8');
echo "mb_strtolower(): " . $lower . "(" . mb_ord($lower) . ")\n";
echo "mb_strtoupper(): " . $upper . "(" . mb_ord($upper) . ")\n";
echo "mb_convert_case(): " . $conv . "(" . mb_ord($conv) . ")\n";
echo "\n";
echo "Matches RegEx /\p{L}/u: " . (preg_match('/\p{L}/u', $letter) ? 'YES' : 'NO') . "\n";
echo "Matches RegEx /\p{N}/u: " . (preg_match('/\p{N}/u', $letter) ? 'YES' : 'NO') . "\n";
echo "Matches RegEx /\p{Xan}/u: " . (preg_match('/\p{Xan}/u', $letter) ? 'YES' : 'NO') . "\n";
}
testUtf8('µ');
And the output I get is:
CHAR: µ
Detected Encoding: UTF-8
IS VALID UTF-8? YES
mb_strtolower(): µ(181)
mb_strtoupper(): Μ(924)
mb_convert_case(): Μ(924)
Matches RegEx /\p{L}/u: YES
Matches RegEx /\p{N}/u: NO
Matches RegEx /\p{Xan}/u: YES
Can someone explain to me why PHP thinks µ is a "letter" and why the MB uppercase version is "Μ"? I was going to work around this by testing the first letter of each word and verifying that it was a valid unicode "letter" before running the conversion, but as you can see that wont work for this character since /\p{L}/u matches that character :(
Any idea how I can work around this?
Here is the rough draft of my function:
/**
* #param string $string The string to convert
* #param string $encoding Default is UTF-8
* #param string $delim_pattern Pattern used to break $string into words
* #return string
*/
public static function mb_ucwords(
string $string,
string $encoding = 'UTF-8',
string $delim_pattern = '/([\/\-\s\v"\'\\\]+)/u'
): string {
$words = preg_split($delim_pattern, $string, -1, PREG_SPLIT_DELIM_CAPTURE);
$output = "";
foreach($words as $word) {
$output .= mb_convert_case($word, MB_CASE_TITLE, $encoding);
}
return $output;
}
Currently testing this code agasinst PHP7.4
EDIT:
Apparently this is a GREEK letter as well as the symbol for micro, and M is the capital version of said GREEK letter. I'm not sure how to handle this...
In Unicode 2, µ (U+00B5 MICRO SIGN) was changed to have a compatibility decomposition of μ (U+03BC GREEK SMALL LETTER MU). At the same time, its category was changed from symbol to letter, to match μ (U+03BC GREEK SMALL LETTER MU). This means that U+00B5 should not be used in new text; it is only to be used for compatibility with non-Unicode character sets. Under certain normalization forms, these are considered to be the same character.
In Unicode 3.0, it was updated to have has M (U+039C GREEK CAPITAL LETTER MU) as its uppercase mapping, giving the result that you see now.
Unfortunately, since µ (U+00B5 MICRO SIGN) is basically deprecated, you're on your own if you use it. You could compare the first character of the string with µ (U+00B5 MICRO SIGN) before calling mb_convert_case. However, there's no guarantee that some system won't silently convert it to μ (U+03BC GREEK SMALL LETTER MU), for example if it normalizes the string. If you will never otherwise use μ (U+03BC GREEK SMALL LETTER MU), you could special-case that character as well.
The fail-safe way to handle this without breaking support for Greek text would be to use some sort of markup language or rich text to indicate that the character is used as a symbol instead of a letter, and then parse that when performing the case conversion. But that would obviously be a larger undertaking.
You could go as simple as this
function mb_ucfirst($string)
{
$main_encoding = "cp1250";
$inner_encoding = "utf-8";
$string = iconv($main_encoding, $inner_encoding, $string);
$strlen = mb_strlen($string);
$firstChar = mb_substr($string, 0, 1, $inner_encoding);
$then = mb_substr($string, 1, $strlen - 1, $inner_encoding);
return iconv($inner_encoding, $main_encoding , mb_strtoupper($firstChar, $inner_encoding) . $then );
}
Keeps the µ while I was testing it.

RegEx (preg_match_all in PHP) to capture series of <tags containing numbers> up to the first alphanumeric character

The problem here is the conflict between numbers and alphanumeric in the problem description.
Given the text:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The
next 11 keys can change the SWING from OFF (50%) to
<19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during
arpeggiator or sequencer operation.<33><34>
I need to extract the following four groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>
<19><20><21><22><23><24><25>
<26><27><28><29><30><31><32>
<33><34>
Reason: we want to display this in a much more user-friendly way as...
[1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]%
during arpeggiator or sequencer operation.[4]
Current code:
$pattern = '<[\d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '\{\}]+>';
$numberofsupertags = preg_match_all('/(' . $pattern . ')/', $source, $superchunks);
echo '<pre>';
print_r($superchunks);
echo '</pre><br>';
(REGSTART/REGEND/REGSTARTSQ/REGENDSQ refer to other possible pairs of symbols, like 【】 or 〖〗 etc.)
gives three groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>
<19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>
<33><34>
As you can see, the RegEx fails to take into account sequences of only numbers between tags.
I've tried lots of things:
$pattern = '([<|' . REGSTART . REGSTARTSQ . '|\{]\d+?[>|' . REGEND . REGENDSQ . | \}])+';
$pattern = '<[\d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '\{\}]+[>(?=\d)|>]';
...but to no avail.
What is the correct solution and where do I go wrong? This looks really simple, but apparently it isn't.
You can use
(?:<(?:{\d+}|【\d+】|〖\d+〗|\d+)>)+
See the regex demo. Details:
(?: - start of a non-capturing group:
< - a < char
(?:{\d+}|【\d+】|〖\d+〗|\d+) - one of the alternatives: { + one or more digits + }, 【 + one or more digits + 】, 〖 + one or more digits + 〗 or one or more digits
> - a > char
)+ - one or more times.
See the PHP demo:
$source = '<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The next 11 keys can change the SWING from OFF (50%) to <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during arpeggiator or sequencer operation.<33><34>';
$cnt = 0;
echo preg_replace_callback('~(?:<(?:{\d+}|【\d+】|〖\d+〗|\d+)>)+~u', function($m) use (&$cnt) {
return '['. ++$cnt .']';
}, $source);
// => [1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]% during arpeggiator or sequencer operation.[4]

Replacing the SPACE at first and last

I want to replace the first and last words and sentences .
I use this code.
$text = ' this is the test for string. ';
echo $text = str_replace(" ", "", $text);
when i have use replace code .
all space is deleted and repalsed.
any body can help me?!
i want get this:
this is the test for string.
You probably want the trim function here:
$text = ' this is the test for string. ';
echo '***' . trim($text) . '***';
***this is the test for string.***
Just to round out this answer, if you wanted to accomplish the same thing using a replacement, you could do a regex replace as follows:
$out = preg_replace("/^\s*|\s*$/", "", $text);
echo '***' . $out . '***';
***this is the test for string.***
This approach might a good starting point if you wanted to do a regex replacement with perhaps slightly different logic.

PHP regex split text to insert HTML

Very(!) new to regex but...
I have the following text strings outputted from a $title variable:
A. This is a title
B. This is another title
etc...
I'm after the following:
<span>A.</span> This is a title
<span>B.</span> This is another title
etc...
Currently I have the following code:
$title = $element['#title'];
if (preg_match("([A-Z][\.])", $title)) {
return '<li' . drupal_attributes($element['#attributes']) . ">Blarg</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
This replaces anything A. through to Z. with Blarg however I'm not sure how to progress this?
In the Text Wrangler app I could wrap regex in brackets and output each argument like so:
argument 1 = \1
argument 2 = \2
etc...
I know I need to add an additional regex to grab the remainder of the text string.
Perhaps a regex guru could help and novice out!
Thanks,
Steve
Try
$title = 'A. This is a title';
$title = preg_replace('/^[A-Z]\./', '<span>$0</span>', $title);
echo $title;
// <span>A.</span> This is a title
If the string contains newlines and other titles following them, add the m modifier after the ending delimiter.
If the regex doesn't match then no replacements will be made, so there is no need for the if statement.
Is it always just 2 char ("A.", "B.", "C.",...)
because then you could work with a substring instead of regex.
Just pick of the first 2 chars of the link and wrap the span around the substring
Try this (untested):
$title = $element['#title'];
if (preg_match("/([A-Z]\.)(.*)/", $title, $matches)) {
return '<li' . drupal_attributes($element['#attributes']) . "><span>{$matches[0]</span>{$matches[1]}</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
The change here was to first add / to the start and end of the string (to denote it's a regex), then remove the [ and ] around the period . because that's just a literal character on its own, then to add another grouping which will match the rest of the string. I also Added a $matches to preg_match() to place these two matches in to to use later, which we do on the next life.
Note: You could also do this instead:
$title = preg_replace('/^([A-Z]\.)/', "<span>$1</span>", $title);
This will simply replace the A-Z followed by the period at the start of the string (denoted with the ^ character) with <span>, that character (grabbed with the brackets) and </span>.
Again, that's not tested, but should give you a headstart :)

Str_replace for multiple items

I remember doing this before, but can't find the code. I use str_replace to replace one character like this: str_replace(':', ' ', $string); but I want to replace all the following characters \/:*?"<>|, without doing a str_replace for each.
Like this:
str_replace(array(':', '\\', '/', '*'), ' ', $string);
Or, in modern PHP (anything from 5.4 onwards), the slighty less wordy:
str_replace([':', '\\', '/', '*'], ' ', $string);
str_replace() can take an array, so you could do:
$new_str = str_replace(str_split('\\/:*?"<>|'), ' ', $string);
Alternatively you could use preg_replace():
$new_str = preg_replace('~[\\\\/:*?"<>|]~', ' ', $string);
For example, if you want to replace search1 with replace1 and search2 with replace2 then following code will work:
print str_replace(
array("search1","search2"),
array("replace1", "replace2"),
"search1 search2"
);
// Output: replace1 replace2
str_replace(
array("search","items"),
array("replace", "items"),
$string
);
If you're only replacing single characters, you should use strtr()
You could use preg_replace(). The following example can be run using command line php:
<?php
$s1 = "the string \\/:*?\"<>|";
$s2 = preg_replace("^[\\\\/:\*\?\"<>\|]^", " ", $s1) ;
echo "\n\$s2: \"" . $s2 . "\"\n";
?>
Output:
$s2: "the string "
I had a situation whereby I had to replace the HTML tags with two different replacement results.
$trades = "<li>Sprinkler and Fire Protection Installer</li>
<li>Steamfitter </li>
<li>Terrazzo, Tile and Marble Setter</li>";
$s1 = str_replace('<li>', '"', $trades);
$s2 = str_replace('</li>', '",', $s1);
echo $s2;
result
"Sprinkler and Fire Protection Installer", "Steamfitter ", "Terrazzo, Tile and Marble Setter",
I guess you are looking after this:
// example
private const TEMPLATE = __DIR__.'/Resources/{type}_{language}.json';
...
public function templateFor(string $type, string $language): string
{
return \str_replace(['{type}', '{language}'], [$type, $language], self::TEMPLATE);
}
In my use case, I parameterized some fields in an HTML document, and once I load these fields I match and replace them using the str_replace method.
<?php echo str_replace(array("{{client_name}}", "{{client_testing}}"), array('client_company_name', 'test'), 'html_document'); ?>

Categories