I have this string authors[0][system:id] and I need a regex that returns:
array('authors', '0', 'system:id')
Any ideas?
Thanks.
Just use PHP's preg_split(), which returns an array of elements similarly to explode() but with RegEx.
Split the string on [ or ] and the remove the last element (which is an empty string) of the provided array, $tokens.
EDIT: Also, remove the 3rd element with array_splice($array, int $offset, int $lenth), since this item is also an empty string.
The regex /[\[\]]/ just means match any [ or ] character
$string = "authors[0][system:id]";
$tokens = preg_split("/[\]\[]/", $string);
array_pop($tokens);
array_splice($tokens, 2, 1);
//rest of your code using $tokens
Here is the format of $tokens after this has run:
Array ( [0] => authors [1] => 0 [2] => system:id )
Taking the most simplistic approach, we would just match the three individual parts. So first of all we'd look for the token that is not enclosed in brackets:
[a-z]+
Then we'd look for the brackets and the value in between:
\[[^\]]+\]
And then we'd repeat the second step.
You'd also need to add capture groups () to extract the actual values that you want.
So when you put it all together you get something like:
([a-z]+)\[([^\]]+)\]\[([^\]]+)\]
That expression could then be used with preg_match() and the values you want would be extracted into the referenced array passed to the third argument (like this). But you'll notice the above expression is quite a difficult-to-read collection of punctuation, and also that the resulting array has an extra element on it that we don't want - preg_match() places the whole matched string into the first index of the output array. We're close, but it's not ideal.
However, as #AlienHoboken correctly points out and almost correctly implements, a simpler solution would be to split the string up based on the position of the brackets. First let's take a look at the expression we'd need (or at least, the one that I would use):
(?:\[|\])+
This looks for at least one occurence of either [ or ] and uses that block as delimiter for the split. This seems like exactly what we need, except when we run it we'll find we have a small issue:
array('authors', '0', 'system:id', '')
Where did that extra empty string come from? Well, the last character of the input string matches you delimiter expression, so it's treated as a split position - with the result that an empty string gets appended to the results.
This is quite a common issue when splitting based on a regular expression, and luckily PCRE knows this and provides a simple way to avoid it: the PREG_SPLIT_NO_EMPTY flag.
So when we do this:
$str = 'authors[0][system:id]';
$expr = '/(?:\[|\])+/';
$result = preg_split($expr, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
...you will see the result you want.
See it working
Related
I have a string
$descr = "Hello this is a test string";
What I am trying to do is to split the string and store each word which is separated using space into separate array index in PHP. Should I use
$myarray = preg_split("[\s]",$descr);
Expected outcome :
$myarray(1) : hello
$myarray(2) : this
$myarray(3) : is
$myarray(4) : a
$myarray(5) : test
$myarray(6) : string
Each number denotes array index
$descr = "Hello this is a test string";
$myarray = explode(' ', $descr);
Will produce:
Array
(
[0] => Hello
[1] => this
[2] => is
[3] => a
[4] => test
[5] => string
)
Use the explode function which takes the delimiter as the first parameter and the string variable you want to "explode" as the second parameter. Each word separated by the sent delimiter will be an element in the array.
You need to use explode() like below:-
$myarray = explode(' ', $descr);
print_r($myarray);
Output:-https://eval.in/847916
To re-index and lowercase each word in your array do like this:-
<?php
$descr = "Hello this is a test string";
$myarray = explode(' ', $descr);
$myarray = array_map('strtolower',array_combine(range(1, count($myarray)), array_values($myarray)));
print_r($myarray);
Output:-https://eval.in/847960
To get how many elements are there in the array:-
echo count($myarray);
One of the best way is to use str_word_count
print_r(str_word_count($descr , 1));
This question is seeking support for a task comprised from 3 separate procedures.
How to split a string on spaces to generate an array of words? (The OP has a suboptimal, yet working solution for this part.)
Because the pattern is only seeking out "spaces" between words, the pattern could be changed to / /. This eliminates the check for additional white-space characters beyond just the space.
Better/Faster than a regex-based solutions would be to split the string using string functions.
explode(' ',$descr) would be the most popular and intuitive function call.
str_word_count($descr,1) as Ravi Hirani pointed out will also work, but is less intuitive.A major benefit to this function is that it seamlessly omits punctuation --
for instance, if the the OP's sample string had a period at the end, this function would omit it from the array!Furthermore, it is important to note what is considered a "word":
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
How to generate an indexed array with keys starting from 1?
Bind a generated "keys" array (from 1) to a "values" array:
$words=explode(' ',$descr); array_combine(range(1,count($words)),$words)
Add a temporary value to the front of the indexed array ([0]), then remove the element with a function that preserves the array keys.
array_unshift($descr,''); unset($descr[0]);
array_unshift($descr,''); $descr=array_slice($descr,1,NULL,true);
How to convert a string to all lowercase? (it was hard to find a duplicate -- this a RTM question)
lcfirst($descr) will work in the OP's test case because only the first letter of the first word is capitalized.
strtolower($descr) is a more reliable choice as it changes whole strings to lowercase.
mb_strtolower($descr) if character encoding is relevant.
Note: ucwords() exists, but lcwords() does not.
There are so many paths to a correct result for this question. How do you determine which is the "best" one? Top priority should be Accuracy. Next should be Efficiency/Directness. Followed by some consideration for Readability. Code Brevity is a matter of personal choice and can clash with Readability.
With these considerations in mind, I would recommend these two methods:
Method #1: (one-liner, 3-functions, no new variables)
$descr="Hello this is a test string";
var_export(array_slice(explode(' ',' '.strtolower($descr)),1,null,true));
Method #2: (two-liner, 3-functions, one new variable)
$descr="Hello this is a test string";
$array=explode(' ',' '.strtolower($descr));
unset($array[0]);
var_export($array);
Method #2 should perform faster than #1 because unset() is a "lighter" function than array_slice().
Explanation for #1 : Convert the full input string to lowercase and prepend $descr with a blank space. The blank space will cause explode() to generate an extra empty element at the start of the output array. array_slice() will output generated array starting from the first element (omitting the unwanted first element).
Explanation for #2 : The same as #1 except it purges the first element from generated array using unset(). While this is faster, it must be written on its own line.
Output from either of my methods:
array (
1 => 'hello',
2 => 'this',
3 => 'is',
4 => 'a',
5 => 'test',
6 => 'string',
)
Related / Near-duplicate:
php explode and force array keys to start from 1 and not 0
i have data like so
$data = '<span class="theclass">data (not important)</span> <span class="anotherclass">extra data (October 1, 2010)</span>';
i want to get the date within the braces so ive done the following preg_match
preg_match("/\((([a-zA-Z]{5,10} .*?)|(\d{4}))\)/i",$data,$res);
please not that sometimes 'October 1' is not present BUT THE YEAR IS ALWAYS PRESENT hence the OR condition.... the thing is it gives me array of 3 in this case, i know its because of the set of 3 braces i have for each condition , is there any other better and cleaner way to achieve this ?
2nd condition method
$data = <span class="theclass">data</span> <span class="theother">data data (2009)</span>
</h3>
Thanks guys
Use lookarounds
Here we're making sure there is a preceding ( character, then we look for text we would see in a date formatted like your example. This little bit of code says ALLOW for alpha numeric characters, a literal space character, and a comma, as well as digits ([A-Za-z ,\d]+)?. The + character means at least 1. It's not as greedy as .* or .+. I'm surrounding it with parenthesis and then adding a ? character to make it not required. It works similar to your | or statement logically because it will still find the year, but we're not making PHP do more work by parsing another check. Then we find the year (always 4 digits {4}). Then we check to make sure it's followed by a literal ) character. The look behind (?<=\() and the look ahead (?=\)) will find a match, but they are not included in the match results, leaving your answer clean.
Since preg_match() returns an array() we're catching the first element in the array. If you're looking for multiple matches in the same string you can use preg_match_all.
$data = '<a href="not important">
<span class="theclass">data (not important)</span></a>
<span class="anotherclass">extra data (October 1, 2010)</span>
<span class="anotherclass">extra data (2011)</span>';
$pattern = '!(?<=\()([A-Za-z ,\d]+)?[\d]{4}(?=\))!';
$res = preg_match_all($pattern,$data,$myDate);
print_r($myDate[0]);
output
Array
(
[0] => October 1, 2010
[1] => 2011
)
If you're only looking for one match you would change the code to this:
$res = preg_match($pattern,$data,$myDate);
echo($myDate[0]);
Output
October 1, 2010
Another way to write the pattern would be like this... we've removed the parenthesis (grouping) and the plus + modifier followed by the conditional ?, but left the first set. Then we're using a * to make it conditional. The difference is with preg_match and preg_match_all, any groupings are also stored in the array. Since this isn't a group, then it will not store extra array elements.
$pattern = '!(?<=\()[A-Za-z ,\d]*[\d]{4}(?=\))!';
I would like to split a string in PHP containing quoted and unquoted substrings.
Let's say I have the following string:
"this is a string" cat dog "cow"
The splitted array should look like this:
array (
[0] => "this is a string"
[1] => "cat"
[2] => "dog"
[3] => "cow"
)
I'm struggling a bit with regex and I'm wondering if it is even possible to achieve with just one regex/preg_split-Call...
The first thing I tried was:
[[:blank:]]*(?=(?:[^"]*"[^"]*")*[^"]*$)[[:blank:]]*
But this splits only array[0] and array[3] correctly - the rest is splitted on a per character base.
Then I found this link:
PHP preg_split with two delimiters unless a delimiter is within quotes
(?=(?:[^"]*"[^"]*")*[^"]*$)
This seems to me as a good startingpoint. However the result in my example is the same as with the first regex.
I tried combining both - first the one for quoted strings and then a second sub-regex which should ommit quoted string (therefore the [^"]):
(?=(?:[^"]*"[^"]*")*[^"]*$)|[[:blank:]]*([^"].*[^"])[[:blank:]]*
Therefore 2 questions:
Is it even possible to achieve what I want with just one regex/preg_split-Call?
If yes, I would appreciate a hint on how to assemble the regex correctly
Since matches cannot overlap, you could use preg_match_all like this:
preg_match_all('/"[^"]*"|\S+/', $input, $matches);
Now $matches[0] should contain what you are looking for. The regex will first try to match a quoted string, and then stop. If that doesn't do it it will just collect as many non-whitespace characters as possible. Since alternations are tried from left to right, the quoted version takes precedence.
EDIT: This will not get rid of the quotes though. To do this, you could use capturing groups:
preg_match_all('/(?|"([^"]*)"|(\S+))/', $input, $matches);
Now $matches[1] will contain exactly what you are looking for. The (?| is there so that both capturing groups end up at the same index.
EDIT 2: Since you were asking for a preg_split solution, that is also possible. We can use a lookahead, that asserts that the space is followed by an even number of quotes (up until the end of the string):
$result = preg_split('/\s+(?=(?:[^"]*"[^"]*")*$)/', $input);
Of course, this will not get rid of the quotes, but that can easily be done in a separate step.
I am trying to learn regex in PHP and messing around with the preg_split function.
It doesn't appear to be correct though, or my understanding is completely wrong.
The test code i am using is:
$string = "test ing ";
var_dump(preg_split('/t/', $string));
I would expect to get an array like the following:
[0] => "es" [1] => " ing "
but the following is being returned:
[0] => "" [1] => "es" [2] => " ing "
Why is there an empty string at the start?
I understand that i can use the PREG_SPLIT_NO_EMPTY flag to filter this but it shouldnt be there to begin with. Should it?
Why shouldn't it? This is exactly how it works. The semantics of a split operation are that you have a string of this format:
value-delimiter-value-delimiter-value-...-delimiter-value
(Note that it is starting and ending with a value, not a delimiter.)
So if your string starts with a delimiter, it is absolutely valid to assume that there is an empty value before that delimiter (since the delimiter is supposed to split something into two). You wouldn't generally want to reject the empty string between two consecutive ts either, would you?
And this is exactly what PREG_SPLIT_NO_EMPTY is for. You use it whenever you do want to get rid of those empty strings.
As a simple example why you would want the default behavior, just think of CSV files. You want to split a line at (for example) ;. You usually also want to allow for empty values. Now if the value in your first column was empty (meaning the line will start with ;, and you chopped that first empty string away completely, then suddenly all indices in the resulting array would correspond to different columns. This is why you want to keep those empty strings as well. In many cases you know how many delimiters there are, and hence how many values - and you want to be able to identify which value belongs at which position. Even if some of them are empty.
It's working 100% correct. The first character is a 't', so it's splitting on that 't' first. Before the first 't' there is nothing, so the array result start with an entry of empty string.
It's happening because of the t at the beginning of your string. If you don't use the PREG_SPLIT_NO_EMPTY option, preg_split will treat an empty string as a valid split.
Think of it this way: Everywhere preg_split sees a t, it chops the string into two chunks: the chunk before the t, and the chunk after it. Even if one of the chunks doesn't have anything in it, it still counts. That piece is just an empty string.
For some applications, this would be perfectly useful -- for example, say you wanted to replace each t with something, but the replacement was too complicated to just use preg_replace. The language wants you to be able to choose, so it keeps the empty split unless you explicitly tell it not to with PREG_SPLIT_NO_EMPTY.
I am trying to learn regex. I have the string:
$x = "5ft2inches";
How can I read [5,2] into an array using a regex?
If you are assuming that the string will be of the form "{number}ft{number}inches" then you can use preg_match():
preg_match('/(\d+)ft(\d+)inches/', $string, $matches);
(\d+) will match a string of one or more digits. The parentheses will tell preg_match() to place the matched numbers into the $matches variable (the third argument to the function). The function will return 1 if it made a match, of 0 if it didn't.
Here is what $matches looks like after a successful match:
Array
(
[0] => 5ft2inches
[1] => 5
[2] => 2
)
The entire matched string is the first element, then the parenthesized matches follow. So to make your desired array:
$array = array($matches[1], $matches[2]);
Assuming PHP, any reason no one has suggested split?
$numbers = preg_split('/[^0-9]+/', $x, -1, PREG_SPLIT_NO_EMPTY);
In Perl:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "5ft2inches";
my %height;
#height{qw(feet inches)} = ($x =~ /^([0-9]+)ft([0-9]+)inches$/);
use Data::Dumper;
print Dumper \%height;
Output:
$VAR1 = {
'feet' => '5',
'inches' => '2'
};
Or, using split:
#height{qw(feet inches)} = split /ft|inches/, $x;
The regular expression is simply /[0-9]+/ but how to get it into an array depends entirely on what programming language you're using.
With Regular Expressions, you can either extract your data in a contextless way, or a contextful way.
IE, if you match for any digits: (\d+) (NB: Assumes that your language honors \d as the shortcut for 'any digits')
You can then extract each group, but you might not know that your string was actually "5 2inches" instead of "6ft2inches" OR "29Cabbages1Fish4Cows".
If you add context: (\d+)ft(\d+)inches
You know for sure what you've extracted (Because otherwise you'd not get a match) and can refer to each group in turn to get the feet and inches.
If you're not always going to have a pair of numbers to extract, you'll need to make the various components optional. Check out This Regular Expression Cheat Sheet (His other cheat sheets are nifty too) for more info,
You don't mention the language you are using, so here is the general solution: You don't "extract" the numbers, you replace everything except numbers with an empty string.
In C#, this would look like
string numbers = Regex.Replace(dirtyString, "[^0-9]", "");
Have to watch out for double digit numbers.
/\d{1,2}/
might work a little better for feet and inches. The max value of '2' should be upped to whatever is appropriate.
use `/[\d]/`