I have a string
$descr = "Hello this is a test string";
What I am trying to do is to split the string and store each word which is separated using space into separate array index in PHP. Should I use
$myarray = preg_split("[\s]",$descr);
Expected outcome :
$myarray(1) : hello
$myarray(2) : this
$myarray(3) : is
$myarray(4) : a
$myarray(5) : test
$myarray(6) : string
Each number denotes array index
$descr = "Hello this is a test string";
$myarray = explode(' ', $descr);
Will produce:
Array
(
[0] => Hello
[1] => this
[2] => is
[3] => a
[4] => test
[5] => string
)
Use the explode function which takes the delimiter as the first parameter and the string variable you want to "explode" as the second parameter. Each word separated by the sent delimiter will be an element in the array.
You need to use explode() like below:-
$myarray = explode(' ', $descr);
print_r($myarray);
Output:-https://eval.in/847916
To re-index and lowercase each word in your array do like this:-
<?php
$descr = "Hello this is a test string";
$myarray = explode(' ', $descr);
$myarray = array_map('strtolower',array_combine(range(1, count($myarray)), array_values($myarray)));
print_r($myarray);
Output:-https://eval.in/847960
To get how many elements are there in the array:-
echo count($myarray);
One of the best way is to use str_word_count
print_r(str_word_count($descr , 1));
This question is seeking support for a task comprised from 3 separate procedures.
How to split a string on spaces to generate an array of words? (The OP has a suboptimal, yet working solution for this part.)
Because the pattern is only seeking out "spaces" between words, the pattern could be changed to / /. This eliminates the check for additional white-space characters beyond just the space.
Better/Faster than a regex-based solutions would be to split the string using string functions.
explode(' ',$descr) would be the most popular and intuitive function call.
str_word_count($descr,1) as Ravi Hirani pointed out will also work, but is less intuitive.A major benefit to this function is that it seamlessly omits punctuation --
for instance, if the the OP's sample string had a period at the end, this function would omit it from the array!Furthermore, it is important to note what is considered a "word":
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
How to generate an indexed array with keys starting from 1?
Bind a generated "keys" array (from 1) to a "values" array:
$words=explode(' ',$descr); array_combine(range(1,count($words)),$words)
Add a temporary value to the front of the indexed array ([0]), then remove the element with a function that preserves the array keys.
array_unshift($descr,''); unset($descr[0]);
array_unshift($descr,''); $descr=array_slice($descr,1,NULL,true);
How to convert a string to all lowercase? (it was hard to find a duplicate -- this a RTM question)
lcfirst($descr) will work in the OP's test case because only the first letter of the first word is capitalized.
strtolower($descr) is a more reliable choice as it changes whole strings to lowercase.
mb_strtolower($descr) if character encoding is relevant.
Note: ucwords() exists, but lcwords() does not.
There are so many paths to a correct result for this question. How do you determine which is the "best" one? Top priority should be Accuracy. Next should be Efficiency/Directness. Followed by some consideration for Readability. Code Brevity is a matter of personal choice and can clash with Readability.
With these considerations in mind, I would recommend these two methods:
Method #1: (one-liner, 3-functions, no new variables)
$descr="Hello this is a test string";
var_export(array_slice(explode(' ',' '.strtolower($descr)),1,null,true));
Method #2: (two-liner, 3-functions, one new variable)
$descr="Hello this is a test string";
$array=explode(' ',' '.strtolower($descr));
unset($array[0]);
var_export($array);
Method #2 should perform faster than #1 because unset() is a "lighter" function than array_slice().
Explanation for #1 : Convert the full input string to lowercase and prepend $descr with a blank space. The blank space will cause explode() to generate an extra empty element at the start of the output array. array_slice() will output generated array starting from the first element (omitting the unwanted first element).
Explanation for #2 : The same as #1 except it purges the first element from generated array using unset(). While this is faster, it must be written on its own line.
Output from either of my methods:
array (
1 => 'hello',
2 => 'this',
3 => 'is',
4 => 'a',
5 => 'test',
6 => 'string',
)
Related / Near-duplicate:
php explode and force array keys to start from 1 and not 0
Related
I have the follow string:
{item1:test},{item2:hi},{another:please work}
What I want to do is turn it into an array that looks like this:
[item1] => test
[item2] => hi
[another] => please work
Here is the code I am currently using for that (which works):
$vf = '{item1:test},{item2:hi},{another:please work}';
$vf = ltrim($vf, '{');
$vf = rtrim($vf, '}');
$vf = explode('},{', $vf);
foreach ($vf as $vk => $vv)
{
$ve = explode(':', $vv);
$vx[$ve[0]] = $ve[1];
}
My concern is; what if the value has a colon in it? For example, lets say that the value for item1 is you:break. That colon is going to make me lose break entirely. What is a better way of coding this in case the value has a colon in it?
Why not to set a limit on explode function. Like this:
$ve = explode(':', $vv, 2);
This way the string will split only at the first occurrence of a colon.
To address the possibility of the values having embedded colons, and for the sake of discussion (not necessarily performance):
$ve = explode(':', $vv);
$key = array_shift($ve);
$vx[$key] = implode(':', $ve);
...grabs the first element of the array, assuming the index will NOT have a colon in it. Then re-joins the rest of the array with colons.
Don't use effing explode for everything.
You can more reliably extract such simple formats with a trivial key:value regex. In particular since you have neat delimiters around them.
And it's far less code:
preg_match_all('/{(\w+):([^}]+)}/', $vf, $match);
$array = array_combine($match[1], $match[2]);
The \w+ just matches an alphanumeric string, and [^}]+ anything that until a closing }. And array_combine more easily turns it into a key=>value array.
Answering your second question:
If your format crashes with specific content it's bad. I think there are 2 types to work around.
Escape delimiters: that would be, every colon and curly brackets have to be escaped which is strange, so data is delimited with e.g. " and only those quotation marks are escaped (than you have JSON in this case)
Save data lengths: this is a bit how PHP serializes arrays. In that data structure you say, that the next n chars is one token.
The first type is easy to read and manipulate although one have to read the whole file to random access it.
The second type would be great for better random accessing if the structure doesn't saves the amount of characters (since in UTF-8 you cannot just skip n chars by not reading them), but saving the amount of bytes to skip. PHP's serialize function produce n == strlen($token), thus I don't know what is the advantage over JSON.
Where possible I try to use JSON for communication between different systems.
I have this string authors[0][system:id] and I need a regex that returns:
array('authors', '0', 'system:id')
Any ideas?
Thanks.
Just use PHP's preg_split(), which returns an array of elements similarly to explode() but with RegEx.
Split the string on [ or ] and the remove the last element (which is an empty string) of the provided array, $tokens.
EDIT: Also, remove the 3rd element with array_splice($array, int $offset, int $lenth), since this item is also an empty string.
The regex /[\[\]]/ just means match any [ or ] character
$string = "authors[0][system:id]";
$tokens = preg_split("/[\]\[]/", $string);
array_pop($tokens);
array_splice($tokens, 2, 1);
//rest of your code using $tokens
Here is the format of $tokens after this has run:
Array ( [0] => authors [1] => 0 [2] => system:id )
Taking the most simplistic approach, we would just match the three individual parts. So first of all we'd look for the token that is not enclosed in brackets:
[a-z]+
Then we'd look for the brackets and the value in between:
\[[^\]]+\]
And then we'd repeat the second step.
You'd also need to add capture groups () to extract the actual values that you want.
So when you put it all together you get something like:
([a-z]+)\[([^\]]+)\]\[([^\]]+)\]
That expression could then be used with preg_match() and the values you want would be extracted into the referenced array passed to the third argument (like this). But you'll notice the above expression is quite a difficult-to-read collection of punctuation, and also that the resulting array has an extra element on it that we don't want - preg_match() places the whole matched string into the first index of the output array. We're close, but it's not ideal.
However, as #AlienHoboken correctly points out and almost correctly implements, a simpler solution would be to split the string up based on the position of the brackets. First let's take a look at the expression we'd need (or at least, the one that I would use):
(?:\[|\])+
This looks for at least one occurence of either [ or ] and uses that block as delimiter for the split. This seems like exactly what we need, except when we run it we'll find we have a small issue:
array('authors', '0', 'system:id', '')
Where did that extra empty string come from? Well, the last character of the input string matches you delimiter expression, so it's treated as a split position - with the result that an empty string gets appended to the results.
This is quite a common issue when splitting based on a regular expression, and luckily PCRE knows this and provides a simple way to avoid it: the PREG_SPLIT_NO_EMPTY flag.
So when we do this:
$str = 'authors[0][system:id]';
$expr = '/(?:\[|\])+/';
$result = preg_split($expr, $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
...you will see the result you want.
See it working
i have this string ++++++1DESIGNRESULTSM25Fe415(Main)
and i have similar string about 2000 lines from which i want to split these..
[++++++] [1] [DESIGNRESULTS] [M25] [Fe415] [(Main)]
from the pattern only the 2nd 4h and 5th value changes
eg.. ++++++2DESIGNRESULTSM30Fe418(Main) etc..
what i actually want is:
Split the first value [++++++]
Split the value after 4 Character of [DESIGNRESULTS] so ill get this [M25]
Split the value before 4 Character of [(Main)] so ill get this [Fe415]
After all this done store the final chunk of piece in an array.
the similar output what i want is
Array ( [0] => 1 [1] => M25 [2] => Fe415 )
Please help me with this...
Thanks in advance :)
Your data split needs are a bit unclear. A regular expression that will get separate matches on each of the chunks you first specify:
(\++)(\d)(DESIGNRESULTS)(M\d\d)(Fe\d\d\d)(\(Main\))
If you only need the two you are asking for at the end, you can use
(\d)DESIGNRESULTS(M\d\d)(Fe\d\d\d)
You could also replace \d\d with \d+ if the number of digits is unknown.
However, based on your examples it looks like each string chunk is a consistent length. It would be even faster to use
array(
substr($string, 6, 1)
//...
)
How about this
$str = "++++++1DESIGNRESULTSM25Fe415(Main)";
$match = array();
preg_match("/^\+{0,}(\d)DESIGNRESULTS(\w{3})(\w{5})/",$str,$match);
array_shift($match);
print_r($match);
I am trying to learn regex. I have the string:
$x = "5ft2inches";
How can I read [5,2] into an array using a regex?
If you are assuming that the string will be of the form "{number}ft{number}inches" then you can use preg_match():
preg_match('/(\d+)ft(\d+)inches/', $string, $matches);
(\d+) will match a string of one or more digits. The parentheses will tell preg_match() to place the matched numbers into the $matches variable (the third argument to the function). The function will return 1 if it made a match, of 0 if it didn't.
Here is what $matches looks like after a successful match:
Array
(
[0] => 5ft2inches
[1] => 5
[2] => 2
)
The entire matched string is the first element, then the parenthesized matches follow. So to make your desired array:
$array = array($matches[1], $matches[2]);
Assuming PHP, any reason no one has suggested split?
$numbers = preg_split('/[^0-9]+/', $x, -1, PREG_SPLIT_NO_EMPTY);
In Perl:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "5ft2inches";
my %height;
#height{qw(feet inches)} = ($x =~ /^([0-9]+)ft([0-9]+)inches$/);
use Data::Dumper;
print Dumper \%height;
Output:
$VAR1 = {
'feet' => '5',
'inches' => '2'
};
Or, using split:
#height{qw(feet inches)} = split /ft|inches/, $x;
The regular expression is simply /[0-9]+/ but how to get it into an array depends entirely on what programming language you're using.
With Regular Expressions, you can either extract your data in a contextless way, or a contextful way.
IE, if you match for any digits: (\d+) (NB: Assumes that your language honors \d as the shortcut for 'any digits')
You can then extract each group, but you might not know that your string was actually "5 2inches" instead of "6ft2inches" OR "29Cabbages1Fish4Cows".
If you add context: (\d+)ft(\d+)inches
You know for sure what you've extracted (Because otherwise you'd not get a match) and can refer to each group in turn to get the feet and inches.
If you're not always going to have a pair of numbers to extract, you'll need to make the various components optional. Check out This Regular Expression Cheat Sheet (His other cheat sheets are nifty too) for more info,
You don't mention the language you are using, so here is the general solution: You don't "extract" the numbers, you replace everything except numbers with an empty string.
In C#, this would look like
string numbers = Regex.Replace(dirtyString, "[^0-9]", "");
Have to watch out for double digit numbers.
/\d{1,2}/
might work a little better for feet and inches. The max value of '2' should be upped to whatever is appropriate.
use `/[\d]/`
I want to split a string into two parts, the string is almost free text,
for example:
$string = 'hi how are you';
and I want the split to look like this:
array(
[0] => hi
[1] => how are you
)
I tried using this regex: /(\S*)\s*(\.*)/ but even when the array returned is the correct size, the values comes empty.
What should be the pattern necessary to make this works?
What are the requirements? Your example seems pretty arbitrary. If all you want is to split on the first space and leave the rest of the string alone, this would do it, using explode:
$pieces = explode(' ', 'hi how are you', 2);
Which basically says "split on spaces and limit the resulting array to 2 elements"
You should not be escaping the "." in the last group. You're trying to match any character, not a literal period.
Corrected: /(\S*)\s*(.*)/