PHP Replace, But Alternate Replace String - php

Okay, here's what I'm trying to do: I'm trying to use PHP to develop what's essentially a tiny subset of a markdown implementation, not worth using a full markdown class.
I need essentially do a str_replace, but alternate the replace string for every occurrence of the needle, so as to handle the opening and closing HTML tags.
For example, italics are a pair of asterisks like *this*, and code blocks are surrounded by backticks like `this`.
I need to replace the first occurrence of a pair of the characters with the opening HTML tag corresponding, and the second with the closing tag.
Any ideas on how to do this? I figured some sort of regular expression would be involved...

Personally, I'd loop through each occurrence of * or \ with a counter, and replace the character with the appropriate HTML tag based on the count (for example, if the count is even and you hit an asterisk, replace it with <em>, if it's odd then replace it with </em>, etc).
But if you're sure that you only need to support a couple simple kinds of markup, then a regular expression for each might be the easiest solution. Something like this for asterisks, for example (untested):
preg_replace('/\*([^*]+)\*/', '<em>\\1</em>', $text);
And something similar for backslashes.

What you're looking for is more commonly handled by a state machine or lexer/parser.
This is ugly but it works. Catch: only for one pattern type at a time.
$input = "Here's some \\italic\\ text and even \\some more\\ wheee";
$output = preg_replace_callback( "/\\\/", 'replacer', $input );
echo $output;
function replacer( $matches )
{
static $toggle = 0;
if ( $toggle )
{
$toggle = 0;
return "</em>";
}
$toggle = 1;
return "<em>";
}

I created an alternative to str_replace, because the PHP manual for str_replace says that:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject.
If replace has fewer values than search, then an empty string is used
for the rest of replacement values.
If search is an array and replace is a string, then this replacement
string is used for every value of search.
The converse would not make sense, though.
But the converse DOES make sense if the same needle appears several times in your haystack, such as '?' in a prepared statement (e.g. PHP's MySQLi extension), and you need to write a log or diagnostic report of what's going on as it runs through the parameters, substituting the parameters in the query string to make a 'cut and paste' version of the query for testing elsewhere.
Occurrences of needle are replaced left-to-right with the values in the replace array. If there are more occurrences of needle that there are replacements, it resets the replace array pointer. This means that for the OP's use, the needle would be "*", and the replacement would be an array with two values, "<I>" and "</I>".
function str_replace_seriatim(string $needle, array $replace, string $haystack) {
$occurrences = substr_count($haystack, $needle);
for ($i = 0; $i <= $occurrences; $i++) {
$substitute = current($replace);
$pos = strpos($haystack, $needle);
if ($pos !== FALSE) $haystack = substr_replace($haystack, $substitute, $pos, strlen($needle));
if ((next($replace) === FALSE)) reset($replace);
}
return $haystack;
}
To do the whole lot in one function call, I suppose that one could expand on this a little, taking an array ($pincushion) of needles and a multidimensional array as the replacement, but I'm not sure if that isn't more work than just multiple function calls.

Related

Get the php regex in string

i have for example the following string
#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]
I want to get the substring that match with #username[Full Name], Im really new on regex thing. Im using the ff code:
$mention_regex = '/#([A-Za-z0-9_]+)/i';
preg_match_all($mention_regex, $content, $matches);
var_dump($matches);
where the $content is the string above.
what should be the correct regex so that i can have the array #username[Full Name] format?
You can use:
#[^]]+]
i.e.:
$string = "#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]";
preg_match_all('/#[^]]+]/', $string, $result);
print_r($result[0]);
Output:
Array
(
[0] => #kirbypanganja[Kirby Panganja]
[1] => #kyraminerva[Kyra]
[2] => #watever[watever ever evergreen]
)
PHP Demo
Regex Demo and Explanation
Regex: /#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/
/#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/ this will match
Example: #thanSomeCharacters[Some Name Can contain space]
Try this code snippet here
<?php
$content='#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]';
$mention_regex = '/#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/i';
preg_match_all($mention_regex, $content, $matches);
print_r($matches);
I'll start with a very direct, one-liner method that I believe is best and then discuss the other options...
Code (Demo):
$string = "#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]";
$result = preg_split('/]\K[^#]+/', $string, 0, PREG_SPLIT_NO_EMPTY);
var_export($result);
Output:
array (
0 => '#kirbypanganja[Kirby Panganja]',
1 => '#kyraminerva[Kyra]',
2 => '#watever[watever ever evergreen]',
)
Pattern (Demo):
] #match a literal closing square bracket
\K #forget the matched closing square bracket
[^#]+ #match 1 or more non-at-signs
My pattern takes 12 steps, which is the same step efficiency as Pedro's pattern.
There are two benefits to the coder by using preg_split():
it does not return true/false nor require an output variable like preg_match_all() which means it can be used as a one-liner without a condition statement.
it returns a one-dimensional array, versus a two-dimensional array like preg_match_all(). This means the the entire returned array is instantly ready to unpack without any subarray accessing.
In case you are wondering what the 3rd and 4th parameters are in preg_split(), the 0 value means return an unlimited amount of substrings. This is the default behavior, but it is used as a placeholder for parameter 4. PREG_SPLIT_NO_EMPTY effectively removes any empty substrings that would have been generated by trying to split at the start or end of the input string.
That concludes my recommended method, now I'll take a moment to compare the other answers currently posted on this page, and then present some non-regex methods which I do not recommended.
The most popular and intuitive method is to use a regex pattern with preg_match_all(). Both Sahil and Pedro have opted for this course of action. Let's compare the patterns that they've chosen...
Sahil's pattern /#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/i correctly matches the desired substrings in 18 steps, but uses unnecessary redundancies like using the i modifier/flag despite using A-Za-z in the character class. Here is a demo. Also, [A-Za-z0-9_] is more simply expressed as \w.
Pedro's pattern /#[^]]+]/ correctly matches the desired string in 12 steps. Here is a demo.
By all comparisons, Pedro's method is superior to Sahil's because it has equal accuracy, higher efficiency, and increased pattern brevity. If you want to use preg_match_all(), you will not find a more refined regex pattern than Pedro's.
That said, there are other ways to extract the desired substrings. First, the more tedious way that doesn't involve regex that I would never recommend...
Regex-free method: strpos() & substr()
$result = [];
while (($start = strpos($string, '#')) !== false) {
$result[] = substr($string, $start, ($stop = strpos($string, ']') + 1) - $start);
$string = substr($string, $stop);
}
var_export($result);
Coders should always entertain the idea of a non-regex method when dissecting strings, but as you can see from this code above, it just isn't sensible for this case. It requires four function calls on each iteration and it isn't the easiest thing to read. So let's dismiss this method.
Here is another way that provides the correct result...
$result = [];
foreach (explode('#', $string) as $v) {
if ($v) {
$result[] = '#' . substr($v, 0, strrpos($v, ']') + 1);
}
}
It makes fewer function calls compared to the previous regex-free method, but it still too much handling for such a simple task.
At this point, it is the clear that the most sensible methods should be using regex. And there is nothing wrong with choosing preg_match_all() -- if this were my project, I might elect to use it. However, it is important to consider the direct-ness of preg_split(). This function is just like explode() but with the ability to use a regex pattern. This question is a perfect stage for preg_split() because the substrings that should be omitted can also be used as the delimiter between the desired substrings.

Match array values against text [duplicate]

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

How to check if array elements exist in a string

I have a list of words in an array. What is the fastest way to check if any of these words exist in an string?
Currently, I am checking the existence of array elements one by one through a foreach loop by stripos. I am curious if there is a faster method, like what we do for str_replace using an array.
Regarding to your additional comment you could explode your string into single words using explode() or preg_split() and then check this array against the needles-array using array_intersect(). So all the work is done only once.
<?php
$haystack = "Hello Houston, we have a problem";
$haystacks = preg_split("/\b/", $haystack);
$needles = array("Chicago", "New York", "Houston");
$intersect = array_intersect($haystacks, $needles);
$count = count($intersect);
var_dump($count, $intersect);
I could imagine that array_intersect() is pretty fast. But it depends what you really want (matching words, matching fragments, ..)
my personal function:
function wordsFound($haystack,$needles) {
return preg_match('/\b('.implode('|',$needles).')\b/i',$haystack);
}
//> Usage:
if (wordsFound('string string string',array('words')))
Notice if you work with UTF-8 exotic strings you need to change \b with teh corrispondent of utf-8 preg word boundary
Notice2: be sure to enter only a-z0-9 chars in $needles (thanks to MonkeyMonkey) otherwise you need to preg_quote it before
Notice3: this function is case insensitve thanks to i modifier
In general regular expressions are slower compared to basic string functions like str_ipos(). But I think it really depends on the situation. If you really need the maximum performance, I suggest making some tests with real-world data.

Replacing Tags with Includes in PHP with RegExps

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.
I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.
How is this useful? It's useful in hooking WordPress.
Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.
Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:
$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);
This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.
There are the same vague security worries as outlined above, but I can't think of anything specific.
You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.
First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.
Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.
Once you have the 3 parts, you'll need to run your include, and stick them back together.
Lather, rinse, repeat.
Stop when you find no more variables.
This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:
Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);
Write a recursive function over the parts with the following criteria:
Halt when length of arg is < 3
Else, return a new array composed of
$arg[0] . load_data($arg[1]) . $arg[2]
plus whatever is left in $argv[3...]
Run your function over $parts.
You can do it without regexes (god forbid), something like:
//return true if $str ends with $sub
function endsWith($str,$sub) {
return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}
$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
if(endsWith(trim($splitStr[$i]),$sub)) {
//file_get_contents($splitStr[$i]) etc...
}
}
Off the top of my head, you want this:
// load the "template" file
$input = file_get_contents($template_file_name);
// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
// match zero will be the entire match - eg {FOO}.
// match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
// convert it to lowercase and append ".html", so you're loading foo.html
// then return the contents of that file.
// BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);
// todo: print the output
Now I'll explain the regex
\{([-A-Z]+)\}
The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
The [-A-Z] says "match any uppercase character, or a -
The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?
Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.
For example:
$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);
$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);

How do you perform a preg_match where the pattern is an array, in php?

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Categories