Get more backreferences from regexp than parenthesis

Get more backreferences from regexp than parenthesis - php

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.

You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.

regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);

Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Related

PHP:preg_replace function

$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?

You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here

If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.

PHP - preg_replace not matching multiple occurrences

Trying to replace a string, but it seems to only match the first occurrence, and if I have another occurrence it doesn't match anything, so I think I need to add some sort of end delimiter?
My code:
$mappings = array(
'fname' => $prospect->forename,
'lname' => $prospect->surname,
'cname' => $prospect->company,
);
foreach($mappings as $key => $mapping) if(empty($mapping)) $mappings[$key] = '$2';
$match = '~{(.*)}(.*?){/.*}$~ise';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
// $source = 'Hello {fname}Default{/fname}';
$text = preg_replace($match, '$mappings["$1"]', $source);
So if I use the $source that's commented, it matches fine, but if I use the one currently in the code above where there's 2 matches, it doesn't match anything and I get an error:
Message: Undefined index: fname}Default{/fname} {lname
Filename: schedule.php(62) : regexp code
So am I right in saying I need to provide an end delimiter or something?
Thanks,
Christian

Apparently your regexp matches fname}Default{/fname} {lname instead of Default.
As I mentioned here use {(.*?)} instead of {(.*)}.
{ has special meaning in regexps so you should escape it \\{.
And I recommend using preg_replace_callback instead of e modifier (you have more flow control and syntax higlighting and it's impossible to force your program to execute malicious code).
Last mistake you're making is not checking whether the requested index exists. :)
My solution would be:
<?php
class A { // Of course with better class name :)
public $mappings = array(
'fname' => 'Tested'
);
public function callback( $match)
{
if( isset( $this->mappings[$match[1]])){
return $this->mappings[$match[1]];
}
return $match[2];
}
}
$a = new A();
$match = '~\\{([^}]+)\\}(.*?)\\{/\\1\\}~is';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
echo preg_replace_callback( $match, array($a, 'callback'), $source);
This results into:
[vyktor#grepfruit tmp]$ php stack.php
Hello Tested Last

Your regular expression is anchored to the end of the string so you closing {/whatever} must be the last thing in your string. Also, since your open and closing tags are simply .*, there's nothing in there to make sure they match up. What you want is to make sure that your closing tag matches your opening one - using a backreference like {(.+)}(.*?){/\1} will make sure they're the same.
I'm sure there's other gotchas in there - if you have control over the format of strings you're working with (IE - you're rolling your own templating language), I'd seriously consider moving to a simpler, easier to match format. Since you're not 'saving' the default values, having enclosing tags provides you with no added value but makes the parsing more complicated. Just using $VARNAME would work just as well and be easier to match (\$[A-Z]+), without involving back-references or having to explicitly state you're using non-greedy matching.

using preg_split to fetch terms between two markers in string

ok I have two strings.
(I use this for a language library system to allow translators to provide translations with placeholders).
In the first string, there are two instances. note that it's not always a single instance, some cases it will be none, one, two, or more.
This is a {[John Doe]} and this is {[Jane Doe]}
and then I have a string that is stored like this:
C'est {[1]} et c'est {[2]}
(translation)
This is a {[1]} and this is a {[2]}
so what I need to do is take the first string, replace everything between {[]} of the starting string and match each instance, i.e. first of first string with {[1]} of second string etc. keep in mind that the reason I am using {[1]} and {[2]} is because in some languages, terms may appear in a different order for gramatical accuracy, but are still terms that don't need translation them selves (names).
so the question is. how do I do this? am thinking preg_split and then match index+1 of each with the second string. that part I can handle. the problem I am having is getting the right regex search going..
this is as close as I could get it..
preg_split('/[(\{\[).*(\]\})]/', $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
that returns an array of everything before and after each instance of {[ and ]} when I am just trying to get the contents of inbetween the two..
EDIT: solution derived from NikiC's answer.
function lang($str){
$nwStr = $str;
preg_match_all('(\{\[(.+?)\]\})', $str, $placeholders);
foreach ($placeholders[0] as $mk => $match) {
$pos = $mk+1;
$nwStr = str_replace("$match","{[$pos]}",$nwStr);
}
$result = preg_replace_callback('(\{\[(\d+)\]\})', function ($matches) use ($placeholders) {
$n = $matches[1]-1;
return $placeholders[1][$n];
}, $translation);
return $result;
}
basically what i am doing here is first looping through to replace the matches with the placeholders so that I can match the proper placeholder text in my language files. (i.e. create the right label string out of the input string)

First grab the placeholders from the string:
preg_match_all('(\{\[(.+?)\]\})', $string, $matches);
$placeholders = $matches[1];
Now replace with a callback:
$result = preg_replace_callback('(\{\[(\d+)\]\})', function ($matches) use ($placeholders) {
$n = $matches[1] + 1;
return $placeholders[$n];
}, $translation);

You're almost there. PREG_SPLIT_DELIM_CAPTURE captures the groups between ( and ), so this:
preg_split('/(\{\[.*\]\})/U', $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
should work better. I also added the U modifier so that * is ungreedy.
edit also, you have a pair of [ and ] which definitely don't belong there!
Another thing, you probably want to have the parts between the {[...]} construct, so this is better:
preg_split('/\{\[(.*)\]\}/U', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
By removing the PREG_SPLIT_NO_EMPTY, you now know for certain that you will find the tagged parts at odd indexes.

php: "sscanf" to 'consume' a string but allows a missing parameter

This is for an osCommerce contribution called
("Automatically add multiple products with attribute to cart from external source")
This existing code uses sscanf to 'explode' a string that represents a
- product ID,
- a productOption,
- and quantity:
sscanf('28{8}17[1]', '%d{%d}%d[%f]',
$productID, // 28
$productOptionID, $optionValueID, //{8}17 <--- Product Options!!!
$productQuantity //[1]
);
This works great if there is only 1 'set' of Product Options (e.g. {8}17).
But this procedure needs to be adapted so that it can handle multiple Product Options, and put them into an array, e.g.:
'28{8}17{7}15{9}19[1]' //array(8=>17, 7=>15, 9=>19)
OR
'28{8}17{7}15[1]' //array(8=>17, 7=>15)
OR
'28{8}17[1]' //array(8=>17)
Thanks in advance. (I'm a pascal programmer)

You should not try to do complex recursive parses with one sscanf. Stick it in a loop. Something like:
<?php
$str = "28{8}17{7}15{9}19[1]";
#$str = "28{8}17{7}15[1]";
#$str = "28{8}17[1]";
sscanf($str,"%d%s",$prod,$rest);
printf("Got prod %d\n", $prod);
while (sscanf($rest,"{%d}%d%s",$opt,$id,$rest))
{
printf("opt=%d id=%d\n",$opt,$id);
}
sscanf($rest,"[%d]",$quantity);
printf("Got qty %d\n",$quantity);
?>

Maybe regular expressions may be interesting
$a = '28{8}17{7}15{9}19[1]';
$matches = null;
preg_match_all('~\\{[0-9]{1,3}\\}[0-9]{1,3}~', $a, $matches);
To get the other things
$id = (int) $a; // ;)
$quantity = substr($a, strrpos($a, '[')+ 1, -1);
According the comment a little update
$a = '28{8}17{7}15{9}19[1]';
$matches = null;
preg_match_all('~\\{([0-9]{1,3})\\}([0-9]{1,3})~', $a, $matches, PREG_SET_ORDER);
$result = array();
foreach ($matches as $entry) {
$result[$entry[1]] = $entry[2];
}

sscanf() is not the ideal tool for this task because it doesn't handle recurring patterns and I don't see any real benefit in type casting or formatting the matched subexpressions.
If this was purely a text extraction task (in other words your incoming data was guaranteed to be perfectly formatted and valid), then I could have recommended a cute solution that used strtr() and parse_str() to quickly generate a completely associative multi-dimensional output array.
However, when you commented "with sscanf I had an infinite loop if there is a missing bracket in the string (because it looks for open and closing {}s). Or if I leave out a value. But with your regex solution, if I drop a bracket or leave out a value", then this means that validation is an integral component of this process.
For that reason, I'll recommend a regex pattern that both validates the string and breaks the string into its meaningful parts. There are several logical aspects to the pattern but the hero here is the \G metacharacter that allows the pattern to "continue" matching where the pattern last finished matching in the string. This way we have an array of continuous fullstring matches to pull data from when creating your desired multidimensional output.
The pattern ^\d+(?=.+\[\d+]$)|\G(?!^)(?:{\K\d+}\d+|\[\K\d(?=]$)) in preg_match_all() generates the following type of output in the fullstring element ([0]):
[id], [option0, option1, ...](optional), [quantity]
The first branch in the pattern (^\d+(?=.+\[\d+]$)) validates the string to start with the id number and ends with a square brace wrapped number representing the quantity.
The second branch begins with the "continue" character and contains two logical branches itself. The first matches an option expression (and forgets the leading { thanks to \K) and the second matches the number in the quantity expression.
To create the associative array of options, target the "middle" elements (if there are any), then split the strings on the lingering } and assign these values as key-value pairs.
This is a direct solution because it only uses one preg_ call and it does an excellent job of validating and parsing the variable length data.
Code: (Demo with a battery of test cases)
if (!preg_match_all('~^\d+(?=.+\[\d+]$)|\G(?!^)(?:{\K\d+}\d+|\[\K\d(?=]$))~', $test, $m)) {
echo "invalid input";
} else {
var_export(
[
'id' => array_shift($m[0]),
'quantity' => array_pop($m[0]),
'options' => array_reduce(
$m[0],
function($result, $string) {
[$key, $result[$key]] = explode('}', $string, 2);
return $result;
},
[]
)
]
);
}

filter specific string in php

$var="UseCountry=1
UseCountryDefault=1
UseState=1
UseStateDefault=1
UseLocality=1
UseLocalityDefault=1
cantidad_productos=5
expireDays=5
apikey=ABQIAAAAFHktBEXrHnX108wOdzd3aBTupK1kJuoJNBHuh0laPBvYXhjzZxR0qkeXcGC_0Dxf4UMhkR7ZNb04dQ
distancia=15
AutoCoord=1
user_add_locality=0
SaveContactForm=0
ShowVoteRating=0
Listlayout=0
WidthThumbs=100
HeightThumbs=75
WidthImage=640
HeightImage=480
ShowImagesSystem=1
ShowOrderBy=0
ShowOrderByDefault=0
ShowOrderDefault=DESC
SimbolPrice=$
PositionPrice=0
FormatPrice=0
ShowLogoAgent=1
ShowReferenceInList=1
ShowCategoryInList=1
ShowTypeInList=1
ShowAddressInList=1
ShowContactLink=1
ShowMapLink=1
ShowAddShortListLink=1
ShowViewPropertiesAgentLink=1
ThumbsInAccordion=5
WidthThumbsAccordion=100
HeightThumbsAccordion=75
ShowFeaturesInList=1
ShowAllParentCategory=0
AmountPanel=
AmountForRegistered=5
RegisteredAutoPublish=1
AmountForAuthor=5
AmountForEditor=5
AmountForPublisher=5
AmountForManager=5
AmountForAdministrator=5
AutoPublish=1
MailAdminPublish=1
DetailLayout=0
ActivarTabs=0
ActivarDescripcion=1
ActivarDetails=1
ActivarVideo=1
ActivarPanoramica=1
ActivarContactar=1
ContactMailFormat=1
ActivarReservas=1
ActivarMapa=1
ShowImagesSystemDetail=1
WidthThumbsDetail=120
HeightThumbsDetail=90
idCountryDefault=1
idStateDefault=1
ms_country=1
ms_state=1
ms_locality=1
ms_category=1
ms_Subcategory=1
ms_type=1
ms_price=1
ms_bedrooms=1
ms_bathrooms=1
ms_parking=1
ShowTextSearch=1
minprice=
maxprice=
ms_catradius=1
idcatradius1=
idcatradius2=
ShowTotalResult=1
md_country=1
md_state=1
md_locality=1
md_category=1
md_type=1
showComments=0
useComment2=0
useComment3=0
useComment4=0
useComment5=0
AmountMonthsCalendar=3
StartYearCalendar=2009
StartMonthCalendar=1
PeriodOnlyWeeks=0
PeriodAmount=3
PeriodStartDay=1
apikey=ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
";
in that string only i want "api==ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ";
plz guide me correctly;

EDIT
As shamittomar pointed out, the parse_str will not work for this situation, posted the proper regex below.
Given this seems to be a QUERY STRING, use the parse_str() function PHP provides.
UPDATE
If you want to do it with regex using preg_match() as powertieke pointed out:
preg_match('/apikey=(.*)/', $var, $matches);
echo $matches[1];
Should do the trick.

preg_match(); should be right up your alley

people are so fast to jump to preg match when this can be done with regular string functions thats faster.
$string = '
expireDays=5
apikey=ABQIAAAAFHktBEXrHnX108wOdzd3aBTupK1kJuoJNBHuh0laPBvYXhjzZxR0qkeXcGC_0Dxf4UMhkR7ZNb04dQ
distancia=15
AutoCoord=1';
//test to see what type of line break it is and explode by that.
$parts = (strstr($string,"\r\n") ? explode("\r\n",$string) : explode("\n",$string));
$data = array();
foreach($parts as $part)
{
$sub = explode("=",trim($part));
if(!empty($sub[0]) || !empty($sub[1]))
{
$data[$sub[0]] = $sub[1];
}
}
and use $data['apikey'] for your api key, i would also advise you to wrpa in function.
I can bet this is a better way to parse the string and much faster.
function ParsemyString($string)
{
$parts = (strstr($string,"\r\n") ? explode("\r\n",$string) : explode("\n",$string));
$data = array();
foreach($parts as $part)
{
$sub = explode("=",trim($part));
if(!empty($sub[0]) || !empty($sub[1]))
{
$data[$sub[0]] = $sub[1];
}
}
return $data;
}
$data = ParsemyString($string);

First of all, you are not looking for
api==ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
but you are looking for
apikey=ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
It is important to know if the api-key property always occurs at the end and if the length of the api-key value is always the same. I this is the case you could use the PHP substr() function which would be easiest.
If not you would most probably need a regular expression which you can feed to PHPs preg_match() function. Something along the lines of apikey==[a-zA-Z0-9\-] Which matches an api-key containing a-z in both lowercase and uppercase and also allows for dashes in the key. If you are using the preg_match() function you can retrieve the matches (and thus your api-key value).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get more backreferences from regexp than parenthesis - php

regex is powerful tool, but sometimes, its not the best approach. $string = "key-value;key1-value"; $s = explode(";",$string); foreach($s as $k){ $e = explode("-",$k); $array[$e[0]]=$e[1]; } print_r($array);

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Related

PHP:preg_replace function

PHP - preg_replace not matching multiple occurrences

using preg_split to fetch terms between two markers in string

php: "sscanf" to 'consume' a string but allows a missing parameter

filter specific string in php

Categories

Resources