PHP string console parameters to array - php

I would like to know how I could transform the given string into the specified array:
String
all ("hi there \(option\)", (this, that), other) another
Result wanted (Array)
[0] => all,
[1] => Array(
[0] => "hi there \(option\)",
[1] => Array(
[0] => this,
[1] => that
),
[2] => other
),
[2] => another
This is used for a kind of console that I'm making on PHP.
I tried to use preg_match_all but, I don't know how I could find parentheses inside parentheses in order to "make arrays inside arrays".
EDIT
All other characters that are not specified on the example should be treated as String.
EDIT 2
I forgot to mention that all parameter's outside the parentheses should be detected by the space character.

The 10,000ft overview
You need to do this with a small custom parser: code takes input of this form and transforms it to the form you want.
In practice I find it useful to group parsing problems like this in one of three categories based on their complexity:
Trivial: Problems that can be solved with a few loops and humane regular expressions. This category is seductive: if you are even a little unsure if the problem can be solved this way, a good rule of thumb is to decide that it cannot.
Easy: Problems that require building a small parser yourself, but are still simple enough that it doesn't quite make sense to bring out the big guns. If you need to write more than ~100 lines of code then consider escalating to the next category.
Involved: Problems for which it makes sense to go formal and use an already existing, proven parser generator¹.
I classify this particular problem as belonging into the second category, which means that you can approach it like this:
Writing a small parser
Defining the grammar
To do this, you must first define -- at least informally, with a few quick notes -- the grammar that you want to parse. Keep in mind that most grammars are defined recursively at some point. So let's say our grammar is:
The input is a sequence
A sequence is a series series of zero or more tokens
A token is either a word, a string or an array
Tokens are separated by one or more whitespace characters
A word is a sequence of alphabetic characters (a-z)
A string is an arbitrary sequence of characters enclosed within double quotes
An array is a series of one or more tokens separated by commas
You can see that we have recursion in one place: a sequence can contain arrays, and an array is also defined in terms of a sequence (so it can contain more arrays etc).
Treating the matter informally as above is easier as an introduction, but reasoning about grammars is easier if you do it formally.
Building a lexer
With the grammar in hand you know need to break the input down into tokens so that it can be processed. The component that takes user input and converts it to individual pieces defined by the grammar is called a lexer. Lexers are dumb; they are only concerned with the "outside appearance" of the input and do not attempt to check that it actually makes sense.
Here's a simple lexer I wrote to parse the above grammar (don't use this for anything important; may contain bugs):
$input = 'all ("hi there", (this, that) , other) another';
$tokens = array();
$input = trim($input);
while($input) {
switch (substr($input, 0, 1)) {
case '"':
if (!preg_match('/^"([^"]*)"(.*)$/', $input, $matches)) {
die; // TODO: error: unterminated string
}
$tokens[] = array('string', $matches[1]);
$input = $matches[2];
break;
case '(':
$tokens[] = array('open', null);
$input = substr($input, 1);
break;
case ')':
$tokens[] = array('close', null);
$input = substr($input, 1);
break;
case ',':
$tokens[] = array('comma', null);
$input = substr($input, 1);
break;
default:
list($word, $input) = array_pad(
preg_split('/(?=[^a-zA-Z])/', $input, 2),
2,
null);
$tokens[] = array('word', $word);
break;
}
$input = trim($input);
}
print_r($tokens);
Building a parser
Having done this, the next step is to build a parser: a component that inspects the lexed input and converts it to the desired format. A parser is smart; in the process of converting the input it also makes sure that the input is well-formed by the grammar's rules.
Parsers are commonly implemented as state machines (also known as finite state machines or finite automata) and work like this:
The parser has a state; this is usually a number in an appropriate range, but each state is also described with a more human-friendly name.
There is a loop that reads reads lexed tokens one at a time. Based on the current state and the value of the token, the parser may decide to do one or more of the following:
take some action that affects its output
change its state to some other value
decide that the input is badly formed and produce an error
¹ Parser generators are programs whose input is a formal grammar and whose output is a lexer and a parser you can "just add water" to: just extend the code to perform "take some action" depending on the type of token; everything else is already taken care of. A quick search on this subject gives led PHP Lexer and Parser Generator?

There's no question that you should write parser if you are building syntax tree. But if you just need to parse this sample input regex still might be a tool:
<?php
$str = 'all, ("hi there", (these, that) , other), another';
$str = preg_replace('/\, /', ',', $str); //get rid off extra spaces
/*
* get rid off undefined constants with surrounding them with quotes
*/
$str = preg_replace('/(\w+),/', '\'$1\',', $str);
$str = preg_replace('/(\w+)\)/', '\'$1\')', $str);
$str = preg_replace('/,(\w+)/', ',\'$1\'', $str);
$str = str_replace('(', 'array(', $str);
$str = 'array('.$str.');';
echo '<pre>';
eval('$res = '.$str); //eval is evil.
print_r($res); //print the result
Demo.
Note: If input will be malformed regex will definitely fail. I am writing this solution just in a case you need fast script. Writing lexer and parser is time-consuming work, that will need lots of research.

As far as I know, the parentheses problem is a Chomsky language class 2, while regular expressions are equivalent to Chomsky language class 3, so there should be no regular expression, which solves this problem.
But I read something not long ago:
This PCRE pattern solves the parentheses problem (assume the PCRE_EXTENDED option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \)
With delimiters and without spaces: /\(((?>[^()]+)|(?R))*\)/.
This is from Recursive Patterns (PCRE) - PHP manual.
There is an example on that manual, which solves nearly the same problem you specified!
You, or others might find it and proceed with this idea.
I think the best solution is to write a sick recursive pattern with preg_match_all. Sadly I'm not in the power to do such madness!

First, I want to thank everyone that helped me on this.
Unfortunately, I can't accept multiple answers because, if I could, I would give to you all because all answers are correct for different types of this problem.
In my case, I just needed something simple and dirty and, following #palindrom and #PLB answers, I've got the following working for me:
$str=transformEnd(transformStart($string));
$str = preg_replace('/([^\\\])\(/', '$1array(', $str);
$str = 'array('.$str.');';
eval('$res = '.$str);
print_r($res); //print the result
function transformStart($str){
$match=preg_match('/(^\(|[^\\\]\()/', $str, $positions, PREG_OFFSET_CAPTURE);
if (count($positions[0]))
$first=($positions[0][1]+1);
if ($first>1){
$start=substr($str, 0,$first);
preg_match_all("/(?:(?:\"(?:\\\\\"|[^\"])+\")|(?:'(?:\\\'|[^'])+')|(?:(?:[^\s^\,^\"^\']+)))/is",$start,$results);
if (count($results[0])){
$start=implode(",", $results[0]).",";
} else {
$start="";
}
$temp=substr($str, $first);
$str=$start.$temp;
}
return $str;
}
function transformEnd($str){
$match=preg_match('/(^\)|[^\\\]\))/', $str, $positions, PREG_OFFSET_CAPTURE);
if (($total=count($positions)) && count($positions[$total-1]))
$last=($positions[$total-1][1]+1);
if ($last==null)
$last=-1;
if ($last<strlen($str)-1){
$end=substr($str,$last+1);
preg_match_all("/(?:(?:\"(?:\\\\\"|[^\"])+\")|(?:'(?:\\\'|[^'])+')|(?:(?:[^\s^\,^\"^\']+)))/is",$end,$results);
if (count($results[0])){
$end=",".implode(",", $results[0]);
} else {
$end="";
}
$temp=substr($str, 0,$last+1);
$str=$temp.$end;
}
if ($last==-1){
$str=substr($str, 1);
}
return $str;
}
Other answers are helpful too for who is searching a better way to do this.
Again, thank you all =D.

I will put the algorithm or pseudo code for implementing this. Hopefully you can work-out how to implement it in PHP:
function Parser([receives] input:string) returns Array
define Array returnValue;
for each integer i from 0 to length of input string do
charachter = ith character from input string.
if character is '('
returnValue.Add(Parser(substring of input after i)); // recursive call
else if character is '"'
returnValue.Add(substring of input from i to the next '"')
else if character is whitespace
continue
else
returnValue.Add(substring of input from i to the next space or end of input)
increment i to the index actually consumed
return returnValue

I want to know if this works:
replace ( with Array(
Use regex to put comma after words or parentheses without comma
preg_replace( '/[^,]\s+/', ',', $string )
eval( "\$result = Array( $string )" )

if the string values are fixed, it can be done some how like this
$ar = explode('("', $st);
$ar[1] = explode('",', $ar[1]);
$ar[1][1] = explode(',', $ar[1][1]);
$ar[1][2] = explode(')',$ar[1][1][2]);
unset($ar[1][1][2]);
$ar[2] =$ar[1][2][1];
unset($ar[1][2][1]);

Related

I have a problem with the strpos & substr function on PHP

I have a problem with the strpos & substr function, thank you for your help:
$temp = "U:hhp|E:123#gmail.com,P:h123";
$find_or = strpos($temp,"|");
$find_and = strpos($temp,",");
$find_user = substr($temp,2,$find_or-2);
$find_email = substr($temp,$find_or+3,$find_and);
$find_passeord = substr($temp,$find_and+3,strlen($temp));
echo("$find_user+$find_email+$find_passeord<br/>");
/************************************/
Why is the output like this ??
hhp+123#gmail.com,P:h123 +h123
but i want this:
hhp+123#gmail.com,h123
The problem is that $find_and is the index of ,, but the third argument to substr() needs to be the length of the substring, not the ending index. So
$find_email = substr($temp,$find_or+3,$find_and);
should be
$find_email = substr($temp,$find_or+3,$find_and-$find_or-3);
For $find_passeord you can omit the 3rd argument, since the default is the end of the string.
However, this would be simpler with a regular expression:
if (preg_match('/^U:(.*?)\|E:(.*?),P:(.*)/', $temp, $match)) {
list($whole, $user, $email, $password) = $match;
}
if you have control over the input I would suggest
$temp = "U:hhp|E:123#gmail.com|P:h123";
list($user, $email, $password) = explode("|",$temp);
$user = explode(":",$user)[1];
$email = explode(":",$email)[1];
$password = explode(":",$password)[1];
if not then I still recommend exploding the string into parts and work your way down to what you need . https://3v4l.org/ is a great site for testing php code ... here is an example of this working https://3v4l.org/upEGG
Echoing what Barmar just said in a comment, regular expressions are definitely the best way to "break up a string." (It is quite-literally much of what they are for.) This is the preg_ family of PHP functions. (e.g. preg_match, preg_match_all, preg_replace.)
The million-dollar idea behind a "regular expression" is that it is a string-matching pattern. If the string "matches" that pattern, you can easily extract the exact substrings which matched portions of it.
In short, all of the strpos/substr logic that you are right now wrestling with ... "goes away!" Poof.
For example, this pattern: ^(.*)|(.*),(.*)$ ...
It says: "Anchored at the beginning of the string ^, capture () a pattern consisting of "zero or more occurrences of any character (.*), until you encounter a literal |. Now, for the second group, proceed until you find a ,. Then, for the third group, proceed to take any character until the end of the string $."
You can "match" that regular expression and simply be handed all three of these groups! (As well as "the total string that matched.") And you didn't have to "write" a thing!
There are thousands of web pages by now which discuss this "remarkable 'programming language' within a single cryptic string." But it just might be the most pragmatically-useful technology for any practitioner to know, and every programming language somehow implements it, more-or-less following the precedent first set by the (still active) programming language, Perl.

Get the php regex in string

i have for example the following string
#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]
I want to get the substring that match with #username[Full Name], Im really new on regex thing. Im using the ff code:
$mention_regex = '/#([A-Za-z0-9_]+)/i';
preg_match_all($mention_regex, $content, $matches);
var_dump($matches);
where the $content is the string above.
what should be the correct regex so that i can have the array #username[Full Name] format?
You can use:
#[^]]+]
i.e.:
$string = "#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]";
preg_match_all('/#[^]]+]/', $string, $result);
print_r($result[0]);
Output:
Array
(
[0] => #kirbypanganja[Kirby Panganja]
[1] => #kyraminerva[Kyra]
[2] => #watever[watever ever evergreen]
)
PHP Demo
Regex Demo and Explanation
Regex: /#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/
/#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/ this will match
Example: #thanSomeCharacters[Some Name Can contain space]
Try this code snippet here
<?php
$content='#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]';
$mention_regex = '/#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/i';
preg_match_all($mention_regex, $content, $matches);
print_r($matches);
I'll start with a very direct, one-liner method that I believe is best and then discuss the other options...
Code (Demo):
$string = "#kirbypanganja[Kirby Panganja] elow #kyraminerva[Kyra] test #watever[watever ever evergreen]";
$result = preg_split('/]\K[^#]+/', $string, 0, PREG_SPLIT_NO_EMPTY);
var_export($result);
Output:
array (
0 => '#kirbypanganja[Kirby Panganja]',
1 => '#kyraminerva[Kyra]',
2 => '#watever[watever ever evergreen]',
)
Pattern (Demo):
] #match a literal closing square bracket
\K #forget the matched closing square bracket
[^#]+ #match 1 or more non-at-signs
My pattern takes 12 steps, which is the same step efficiency as Pedro's pattern.
There are two benefits to the coder by using preg_split():
it does not return true/false nor require an output variable like preg_match_all() which means it can be used as a one-liner without a condition statement.
it returns a one-dimensional array, versus a two-dimensional array like preg_match_all(). This means the the entire returned array is instantly ready to unpack without any subarray accessing.
In case you are wondering what the 3rd and 4th parameters are in preg_split(), the 0 value means return an unlimited amount of substrings. This is the default behavior, but it is used as a placeholder for parameter 4. PREG_SPLIT_NO_EMPTY effectively removes any empty substrings that would have been generated by trying to split at the start or end of the input string.
That concludes my recommended method, now I'll take a moment to compare the other answers currently posted on this page, and then present some non-regex methods which I do not recommended.
The most popular and intuitive method is to use a regex pattern with preg_match_all(). Both Sahil and Pedro have opted for this course of action. Let's compare the patterns that they've chosen...
Sahil's pattern /#[A-Za-z0-9_]+\[[a-zA-Z\s]+\]/i correctly matches the desired substrings in 18 steps, but uses unnecessary redundancies like using the i modifier/flag despite using A-Za-z in the character class. Here is a demo. Also, [A-Za-z0-9_] is more simply expressed as \w.
Pedro's pattern /#[^]]+]/ correctly matches the desired string in 12 steps. Here is a demo.
By all comparisons, Pedro's method is superior to Sahil's because it has equal accuracy, higher efficiency, and increased pattern brevity. If you want to use preg_match_all(), you will not find a more refined regex pattern than Pedro's.
That said, there are other ways to extract the desired substrings. First, the more tedious way that doesn't involve regex that I would never recommend...
Regex-free method: strpos() & substr()
$result = [];
while (($start = strpos($string, '#')) !== false) {
$result[] = substr($string, $start, ($stop = strpos($string, ']') + 1) - $start);
$string = substr($string, $stop);
}
var_export($result);
Coders should always entertain the idea of a non-regex method when dissecting strings, but as you can see from this code above, it just isn't sensible for this case. It requires four function calls on each iteration and it isn't the easiest thing to read. So let's dismiss this method.
Here is another way that provides the correct result...
$result = [];
foreach (explode('#', $string) as $v) {
if ($v) {
$result[] = '#' . substr($v, 0, strrpos($v, ']') + 1);
}
}
It makes fewer function calls compared to the previous regex-free method, but it still too much handling for such a simple task.
At this point, it is the clear that the most sensible methods should be using regex. And there is nothing wrong with choosing preg_match_all() -- if this were my project, I might elect to use it. However, it is important to consider the direct-ness of preg_split(). This function is just like explode() but with the ability to use a regex pattern. This question is a perfect stage for preg_split() because the substrings that should be omitted can also be used as the delimiter between the desired substrings.

Extracting substrings between curly brackets inside a string into an array using PHP

I need help extracing all the sub string between curly brackets that are found inside a specific string.
I found some solutions in javascript but I need it for PHP.
$string = "www.example.com/?foo={foo}&test={test}";
$subStrings = HELPME($string);
print_r($subStrings);
The result should be:
array( [0] => foo, [1] => test )
I tried playing with preg_match but I got confused.
I'd appreciate if whoever manage to get it to work with preg_match, explain also what is the logic behind it.
You could use this regex to capture the strings between {}
\{([^}]*)\}
Explanation:
\{ Matches a literal {
([^}]*) Capture all the characters not of } zero or more times. So it would capture upto the next } symbol.
\} Matches a literal }
Your code would be,
<?php
$regex = '~\{([^}]*)\}~';
$string = "www.example.com/?foo={foo}&test={test}";
preg_match_all($regex, $string, $matches);
var_dump($matches[1]);
?>
Output:
array(2) {
[0]=>
string(3) "foo"
[1]=>
string(4) "test"
}
DEMO
Regex Pattern: \{(\w+)\}
Get all the matches that is captured by parenthesis (). The pattern says anything that is enclosed by {...} are captured.
Sample code:
$regex = '/\{(\w{1,})\}/';
$testString = ''; // Fill this in
preg_match_all($regex, $testString, $matches);
// the $matches variable contains the list of matches
Here is demo on debuggex
If you want to capture any type of character inside the {...} then try below regex pattern.
Regex : \{(.*?)\}
Sample code:
$regex = '/\{(.{0,}?)\}/';
$testString = ''; // Fill this in
preg_match_all($regex, $testString, $matches);
// the $matches variable contains the list of matches
Here is demo on debuggex
<?php
$string = "www.example.com/?foo={foo}&test={test}";
$found = preg_match('/\{([^}]*)\}/',$string, $subStrings);
if($found){
print_r($subStrings);
}else{
echo 'NOPE !!';
}
DEMO HERE
Function parse_url, which parses a URL and return its components. Including the query string.
Try This:
preg_match_all("/\{.*?\}/", $string, $subStrings);
var_dump($subStrings[0]);
Good Luck!
You can use the expression (?<=\{).*?(?=\}) to match any string of text enclosed in {}.
$string = "www.example.com/?foo={foo}&test={test}";
preg_match_all("/(?<=\{).*?(?=\})/",$string,$matches);
print_r($matches[0]);
Regex explained:
(?<=\{) is a positive lookbehind, asserting that the line of text is preceeded by a {.
Similarly (?=\}) is a positive lookahead asserting that it is followed by a }. .* matches 0 or more characters of any type. And the ? in .*? makes it match the least possible amount of characters. (Meaning it matches foo in {foo} and {bar} as opposed to foo} and {bar.
$matches[0] contains an array of all the matched strings.
I see answers here using regular expressions with capture groups, lookarounds, and lazy quantifiers. All of these techniques will slow down the pattern -- granted, the performance is very unlikely to be noticeable in the majority of use cases. Because we are meant to offer solutions that are suitable to more scenarios than just the posted question, I'll offer a few solutions that deliver the expected result and explain the differences using the OP's www.example.com/?foo={foo}&test={test} string assigned to $url. I have prepared a php DEMO of the techniques to follow. For information about the function calls, please follow the links to the php manual. For an in depth breakdown of the regex patterns, I recommend using regex101.com -- a free online tool that allows you to test patterns against strings, see the results as both highlighted text and a grouped list, and provides a technique breakdown character-by-character of how the regex engine is interpreting your pattern.
#1 Because your input string is a url, a non-regex technique is appropriate because php has native functions to parse it: parse_url() with parse_str(). Unfortunately, your requirements go beyond extracting the query string's values, you also wish to re-index the array and remove the curly braces from the values.
parse_str(parse_url($url, PHP_URL_QUERY), $assocArray);
$values = array_map(function($v) {return trim($v, '{}');}, array_values($assocArray));
var_export($values);
While this approach is deliberate and makes fair use of native functions that were built for these jobs, it ends up making longer, more convoluted code which is somewhat unpleasant in terms of readability. Nonetheless, it provides the desired output array and should be considered as a viable process.
#2 preg_match_all() is a super brief and highly efficient technique to extract the values. One draw back with using regular expressions is that the regex engine is completely "unaware" of any special meanings that a formatted input string may have. In this case, I don't see any negative impacts, but when hiccups do arise, often the solution is to use a parser that is "format/data-type aware".
var_export(preg_match_all('~\{\K[^}]*~', $url, $matches) ? $matches[0] : []);
Notice that my pattern does not need capture groups or lookarounds; nor does my answer suffer from the use of a lazy quantifier. \K is used to "restart the fullstring match" (in other words, forget any matched characters upto that point). All of these features will mean that the regex engine can traverse the string with peak efficiency. If there is a downsides to using the function they are:
that a multi-dimensional array is generated while you only want a one-dimensional array
that the function creates a reference variable instead of returning the results
#3 preg_split() most closely aligns with the plain-English intent of your task AND it provides the exact output as its return value.
var_export(preg_split('~(?:(?:^|})[^{]*{)|}[^{]*$~', $url, 0, PREG_SPLIT_NO_EMPTY));
My pattern, while admittedly unsavoury to the novice regex pattern designer AND slightly less efficient because it is making "branched" matches (|), basically says: "Split the string at the following delimiters:
from the start of the string or from a }, including all non-{ characters, then the first encountered { (this is the end of the delimiter).
from the lasts }, including all non-{ characters until the end of the string."

Match array values against text [duplicate]

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.
Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.
First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:
$matches = false;
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
$matches = true;
}
}
You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.
I would recommend at least grouping your patterns using parenthesis like:
foreach ($patterns as $pattern)
{
$grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");
But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.
Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.
So doing this:
foreach ($strings_to_match as $string_to_match)
{
if (strpos($page, $string_to_match) !== false))
{
// etc.
break;
}
}
foreach ($pattern_array as $pattern)
{
if (preg_match($pattern, $page))
{
// etc.
break;
}
}
and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().
// assuming you have something like this
$patterns = array('a','b','\w');
// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);
if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}
// PS: that's off the top of my head, I didn't check it in a code editor
If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:
$regex = "/
pattern1| # search for occurences of 'pattern1'
pa..ern2| # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat # Note that the last pattern does NOT have a pipe char
/x";
With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.
This would avoid the looping through the array.
If you're merely searching for the presence of a string in another string, use strpos as it is faster.
Otherwise, you could just iterate over the array of patterns, calling preg_match each time.
If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.
What about doing a str_replace() on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:
$sites = array(
'you_tube' => array('dead', 'moved'),
...
);
foreach ($sites as $site => $deadArray) {
// get $html
if ($html == str_replace($deadArray, '', $html)) {
// video is live
}
}
You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.
$patterns = array(
'abc',
'\d+h',
'[abc]{6,8}\-\s*[xyz]{6,8}',
);
$master_pattern = '/(' . implode($patterns, ')|(') . ')/'
if(preg_match($master_pattern, $string_to_check))
{
//do something
}
Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

Replacing Tags with Includes in PHP with RegExps

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.
I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.
How is this useful? It's useful in hooking WordPress.
Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.
Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:
$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);
This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.
There are the same vague security worries as outlined above, but I can't think of anything specific.
You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.
First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.
Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.
Once you have the 3 parts, you'll need to run your include, and stick them back together.
Lather, rinse, repeat.
Stop when you find no more variables.
This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:
Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);
Write a recursive function over the parts with the following criteria:
Halt when length of arg is < 3
Else, return a new array composed of
$arg[0] . load_data($arg[1]) . $arg[2]
plus whatever is left in $argv[3...]
Run your function over $parts.
You can do it without regexes (god forbid), something like:
//return true if $str ends with $sub
function endsWith($str,$sub) {
return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}
$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
if(endsWith(trim($splitStr[$i]),$sub)) {
//file_get_contents($splitStr[$i]) etc...
}
}
Off the top of my head, you want this:
// load the "template" file
$input = file_get_contents($template_file_name);
// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
// match zero will be the entire match - eg {FOO}.
// match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
// convert it to lowercase and append ".html", so you're loading foo.html
// then return the contents of that file.
// BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);
// todo: print the output
Now I'll explain the regex
\{([-A-Z]+)\}
The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
The [-A-Z] says "match any uppercase character, or a -
The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?
Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.
For example:
$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);
$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);

Categories