extract a part from an expression - php

I have some expressions of the form aa/bbbb/c/dd/ee. I want to select only the part dd from it using a php code. Using 'substr' it can be done, but the problem is that the lengths of bbbb can vary from 3 (i.e., bbb) to 4, lengths of c can be 1 or 2 and the lengths of dd can be 2, 3 or 4. Then how can I extract the part dd (i.e, the part between the last pair / /).

Use explode to explode the string into an array and then grab the 4th item in the array which will be dd regardless of the size of the other elements, just make sure the number of '/' stays the same

If the structure of the expression always has the "/" separators even if the values in between are varying in length (or sometimes absent) you can use explode().
$parts_array = explode("/", $expression);
$dd = parts_array[3];
If the number of slashes varies, you'll have to do more work, like determining how many slashes there are and what parts of the expression are missing. That's a fair bit more complex.

Here you go, the PHP script for selecting the text between the last pair of "/../":
<?php
$expression = "aaa/vvv/bbbb/cccc/ddd/ee";
$mystuff = explode("/", $expression);
echo $mystuff[sizeof($mystuff)-2];
?>
I hope it helps. Good luck!

Related

Best way to parse this string and create an array from it

I have the follow string:
{item1:test},{item2:hi},{another:please work}
What I want to do is turn it into an array that looks like this:
[item1] => test
[item2] => hi
[another] => please work
Here is the code I am currently using for that (which works):
$vf = '{item1:test},{item2:hi},{another:please work}';
$vf = ltrim($vf, '{');
$vf = rtrim($vf, '}');
$vf = explode('},{', $vf);
foreach ($vf as $vk => $vv)
{
$ve = explode(':', $vv);
$vx[$ve[0]] = $ve[1];
}
My concern is; what if the value has a colon in it? For example, lets say that the value for item1 is you:break. That colon is going to make me lose break entirely. What is a better way of coding this in case the value has a colon in it?
Why not to set a limit on explode function. Like this:
$ve = explode(':', $vv, 2);
This way the string will split only at the first occurrence of a colon.
To address the possibility of the values having embedded colons, and for the sake of discussion (not necessarily performance):
$ve = explode(':', $vv);
$key = array_shift($ve);
$vx[$key] = implode(':', $ve);
...grabs the first element of the array, assuming the index will NOT have a colon in it. Then re-joins the rest of the array with colons.
Don't use effing explode for everything.
You can more reliably extract such simple formats with a trivial key:value regex. In particular since you have neat delimiters around them.
And it's far less code:
preg_match_all('/{(\w+):([^}]+)}/', $vf, $match);
$array = array_combine($match[1], $match[2]);
The \w+ just matches an alphanumeric string, and [^}]+ anything that until a closing }. And array_combine more easily turns it into a key=>value array.
Answering your second question:
If your format crashes with specific content it's bad. I think there are 2 types to work around.
Escape delimiters: that would be, every colon and curly brackets have to be escaped which is strange, so data is delimited with e.g. " and only those quotation marks are escaped (than you have JSON in this case)
Save data lengths: this is a bit how PHP serializes arrays. In that data structure you say, that the next n chars is one token.
The first type is easy to read and manipulate although one have to read the whole file to random access it.
The second type would be great for better random accessing if the structure doesn't saves the amount of characters (since in UTF-8 you cannot just skip n chars by not reading them), but saving the amount of bytes to skip. PHP's serialize function produce n == strlen($token), thus I don't know what is the advantage over JSON.
Where possible I try to use JSON for communication between different systems.

preg_match within braces with optional existence additional content within braces sometimes

i have data like so
$data = '<span class="theclass">data (not important)</span> <span class="anotherclass">extra data (October 1, 2010)</span>';
i want to get the date within the braces so ive done the following preg_match
preg_match("/\((([a-zA-Z]{5,10} .*?)|(\d{4}))\)/i",$data,$res);
please not that sometimes 'October 1' is not present BUT THE YEAR IS ALWAYS PRESENT hence the OR condition.... the thing is it gives me array of 3 in this case, i know its because of the set of 3 braces i have for each condition , is there any other better and cleaner way to achieve this ?
2nd condition method
$data = <span class="theclass">data</span> <span class="theother">data data (2009)</span>
</h3>
Thanks guys
Use lookarounds
Here we're making sure there is a preceding ( character, then we look for text we would see in a date formatted like your example. This little bit of code says ALLOW for alpha numeric characters, a literal space character, and a comma, as well as digits ([A-Za-z ,\d]+)?. The + character means at least 1. It's not as greedy as .* or .+. I'm surrounding it with parenthesis and then adding a ? character to make it not required. It works similar to your | or statement logically because it will still find the year, but we're not making PHP do more work by parsing another check. Then we find the year (always 4 digits {4}). Then we check to make sure it's followed by a literal ) character. The look behind (?<=\() and the look ahead (?=\)) will find a match, but they are not included in the match results, leaving your answer clean.
Since preg_match() returns an array() we're catching the first element in the array. If you're looking for multiple matches in the same string you can use preg_match_all.
$data = '<a href="not important">
<span class="theclass">data (not important)</span></a>
<span class="anotherclass">extra data (October 1, 2010)</span>
<span class="anotherclass">extra data (2011)</span>';
$pattern = '!(?<=\()([A-Za-z ,\d]+)?[\d]{4}(?=\))!';
$res = preg_match_all($pattern,$data,$myDate);
print_r($myDate[0]);
output
Array
(
[0] => October 1, 2010
[1] => 2011
)
If you're only looking for one match you would change the code to this:
$res = preg_match($pattern,$data,$myDate);
echo($myDate[0]);
Output
October 1, 2010
Another way to write the pattern would be like this... we've removed the parenthesis (grouping) and the plus + modifier followed by the conditional ?, but left the first set. Then we're using a * to make it conditional. The difference is with preg_match and preg_match_all, any groupings are also stored in the array. Since this isn't a group, then it will not store extra array elements.
$pattern = '!(?<=\()[A-Za-z ,\d]*[\d]{4}(?=\))!';

PHP Regex to identify keys in array representation

I have this string authors[0][system:id] and I need a regex that returns:
array('authors', '0', 'system:id')
Any ideas?
Thanks.
Just use PHP's preg_split(), which returns an array of elements similarly to explode() but with RegEx.
Split the string on [ or ] and the remove the last element (which is an empty string) of the provided array, $tokens.
EDIT: Also, remove the 3rd element with array_splice($array, int $offset, int $lenth), since this item is also an empty string.
The regex /[\[\]]/ just means match any [ or ] character
$string = "authors[0][system:id]";
$tokens = preg_split("/[\]\[]/", $string);
array_pop($tokens);
array_splice($tokens, 2, 1);
//rest of your code using $tokens
Here is the format of $tokens after this has run:
Array ( [0] => authors [1] => 0 [2] => system:id )
Taking the most simplistic approach, we would just match the three individual parts. So first of all we'd look for the token that is not enclosed in brackets:
[a-z]+
Then we'd look for the brackets and the value in between:
\[[^\]]+\]
And then we'd repeat the second step.
You'd also need to add capture groups () to extract the actual values that you want.
So when you put it all together you get something like:
([a-z]+)\[([^\]]+)\]\[([^\]]+)\]
That expression could then be used with preg_match() and the values you want would be extracted into the referenced array passed to the third argument (like this). But you'll notice the above expression is quite a difficult-to-read collection of punctuation, and also that the resulting array has an extra element on it that we don't want - preg_match() places the whole matched string into the first index of the output array. We're close, but it's not ideal.
However, as #AlienHoboken correctly points out and almost correctly implements, a simpler solution would be to split the string up based on the position of the brackets. First let's take a look at the expression we'd need (or at least, the one that I would use):
(?:\[|\])+
This looks for at least one occurence of either [ or ] and uses that block as delimiter for the split. This seems like exactly what we need, except when we run it we'll find we have a small issue:
array('authors', '0', 'system:id', '')
Where did that extra empty string come from? Well, the last character of the input string matches you delimiter expression, so it's treated as a split position - with the result that an empty string gets appended to the results.
This is quite a common issue when splitting based on a regular expression, and luckily PCRE knows this and provides a simple way to avoid it: the PREG_SPLIT_NO_EMPTY flag.
So when we do this:
$str = 'authors[0][system:id]';
$expr = '/(?:\[|\])+/';
$result = preg_split($expr, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
...you will see the result you want.
See it working

Filter array of numeric PIN code strings which may be in the format "######" or "### ###"

I have a PHP array of strings. The strings are supposed to represent PIN codes which are of 6 digits like:
560095
Having a space after the first 3 digits is also considered valid e.g. 560 095.
Not all array elements are valid. I want to filter out all invalid PIN codes.
Yes you can make use of regex for this.
PHP has a function called preg_grep to which you pass your regular expression and it returns a new array with entries from the input array that match the pattern.
$new_array = preg_grep('/^\d{3} ?\d{3}$/',$array);
Explanation of the regex:
^ - Start anchor
\d{3} - 3 digits. Same as [0-9][0-9][0-9]
? - optional space (there is a space before ?)
If you want to allow any number of any whitespace between the groups
you can use \s* instead
\d{3} - 3 digits
$ - End anchor
Yes, you can use a regular expression to make sure there are 6 digits with or without a space.
A neat tool for playing with regular expressions is RegExr... here's what RegEx I came up with:
^[0-9]{3}\s?[0-9]{3}$
It matches the beginning of the string ^, then any three numbers [0-9]{3} followed by an optional space \s? followed by another three numbers [0-9]{3}, followed by the end of the string $.
Passing the array into the PHP function preg_grep along with the Regex will return a new array with only matching indeces.
If you just want to iterate over the valid responses (loop over them), you could always use a RegexIterator:
$regex = '/^\d{3}\s?\d{3}$/';
$it = new RegexIterator(new ArrayIterator($array), $regex);
foreach ($it as $valid) {
//Only matching items will be looped over, non-matching will be skipped
}
It has the benefit of not copying the entire array (it computes the next one when you want it). So it's much more memory efficient than doing something with preg_grep for large arrays. But it also will be slower if you iterate multiple times (but for a single iteration it should be faster due to the memory usage).
If you want to get an array of the valid PIN codes, use codaddict's answer.
You could also, at the same time as filtering only valid PINs, remove the optional space character so that all PINs become 6 digits by using preg_filter:
$new_array = preg_filter('/^(\d{3}) ?(\d{3})$/D', '$1$2', $array);
The best answer might depend on your situation, but if you wanted to do a simple and low cost check first...
$item = str_replace( " ", "", $var );
if ( strlen( $item ) !== 6 ){
echo 'fail early';
}
Following that, you could equally go on and do some type checking - as long as valid numbers did not start with a 0 in which case is might be more difficult.
If you don't fail early, then go on with the regex solutions already posted.

How to split a string and find the occurence of one string in another?

I need to figure out how to do some C# code in php, and im not sure exactly how.
so first off i need the Split function, im going to have a string like
"identifier 82asdjka271akshjd18ajjd"
and i need to split the identifier word from the rest. so in C#, i used string.Split(new char{' '}); or something like that (working off the top of my head) and got two strings, the first word, and then the second part.. i understand that the php split function has been deprecated as of PHP 5.3.0.. so thats not an option, what are the alternatives?
and im also looking for a IndexOf function, so if i had the above code again as an example, i would need the location of 271 in the string, so i can generate a substring.
you can use explode for splitting and strpos for finding the index of one string inside another.
$a = "identifier 82asdjka271akshjd18ajjd";
$arr = explode(' ',$a); // split on space..to get an array of size 2.
$pos = strpos($arr[1],'271'); // search for '271' in the 2nd ele of array.
echo $pos; // prints 8

Categories