split string by any amount of whitespace in PHP - php

I know how to split a string so the words between the delimitor into elements in an array using .explode() by " ".
But that only splits the string by a single whitespace character. How can I split by any amount of whitespace?
So an element in the array end when whitespace is found and the next element in the array starts when the first next non-whitespace character is found.
So something like "The quick brown fox" turns into an array with The, quick, brown, and fox are elements in the returned array.
And "jumped over the lazy dog" also splits so each word is an individual element in the returned array.

Like this:
preg_split('#\s+#', $string, null, PREG_SPLIT_NO_EMPTY);

$yourSplitArray=preg_split('/[\ \n\,]+/', $your_string);

try this
preg_split(" +", "hypertext language programming"); //for one or more whitespaces

you can see here: PHP explode() Function
<?php
$str = "Hello world. It's a beautiful day.";
print_r (explode(" ",$str));
?>
will return:
Array ( [0] => Hello [1] => world. [2] => It's [3] => a [4] => beautiful [5] => day. )

Related

Explode a string where the explode condition is bunch of specific characters

I'm looking for a way to explode a string. For example, I have the following string: (we don't count the beginning - 0x)
0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
which is actually an ETH transaction input. I need to explode this string into 3 parts. Imagine 1 bunch of zeros is actually a single space and these spaces define the gates where the string should be exploded.
How can I do that?
preg_split()
This function uses a regular expression to split a string.
So in this example at two or more 0 in a row:
$arr = preg_split('/[0]{2,}/', $string);
print_r($arr);
echo PHP_EOL;
This will output the following:
Array
(
[0] => a9059xbb
[1] => fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d
[2] => 54368
)
Be aware that you will have problems if a message itself has a 00 in it. Assuming it is used as a null-byte for "end of string", this will not happen, though.
preg_match()
This is an example using regular expressions. You can split at arbitrary points.
$string = 'a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368';
print_r($string);
echo PHP_EOL;
$res = preg_match('/(.{4})(.{32})(.{32})/', $string, $matches);
print_r($matches);
echo PHP_EOL;
This outputs:
a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199a
[1] => a905
[2] => 9xbb000000000000000000000000fc7a
[3] => 5f48a1a1b3f48e7dcb1f23a1ea24199a
)
As you can see /(.{4})(.{32})(.{32})/ will find 4 bytes, then 32 and after that 32 again. Capturing groups are made with () around what you want to find. They appear in the $matches array (0 is always the whole string found).
In case you want to ignore certain parts you can express that as well:
/(.{4})9x(.{32}).{4}(.{32})/
This changes the found string:
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d000
[1] => a905
[2] => bb000000000000000000000000fc7a5f
[3] => a1b3f48e7dcb1f23a1ea24199af4d000
)
Links
PHP documentation for the mentioned functions:
https://www.php.net/manual/en/function.preg-split.php
https://www.php.net/manual/en/book.pcre.php
Play around with the second regular expression using this demo:
https://regex101.com/r/pfZtH8/1
If you will always explode them at the same points (4 bytes(8 hexadecimal digits), 32 bytes(64 hexadecimal digits), 32 bytes(64 hexadecimal digits)), you could use substr().
$input = "0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368";
$first = substr($input,2,8);
$second = substr($input,10,64);
$third = substr($input,74,64);
print_r($first);
print "<br>";
print_r($second);
print "<br>";
print_r($third);
print "<br>";
this outputs:
a9059xbb
000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d0
0000000000000000000000000000000000000000000000000000000000054368

split paragraphs in array php

I have an array of this form:
$steps = array (0=> "the sentence one. the sentence two. the sentence three.",
1=> "the sentence for. the sentence 5");
and I want to have an array $steps like this:
$steps = array (0 => "the sentence one.",
1 => "the sentence two.",
.
.
4 =>"the sentence for."
);
I tried to use explode and implode but I did not succeed.
You can split your strings in your existing array using (?<=\.\s)(?=\w) regex and then iterate over all the matches using foreach loop and keep adding all the splitted strings in an array. Check this PHP code,
$steps = array (0=> "the sentence one. the sentence two. the sentence three.",
1=> "the sentence for. the sentence 5");
$arr = array();
foreach ($steps as $s) {
$mat = preg_split('/(?<=\.\s)(?=\w)/', $s);
foreach($mat as $m) {
array_push($arr,$m);
}
}
print_r($arr);
Prints,
Array
(
[0] => the sentence one.
[1] => the sentence two.
[2] => the sentence three.
[3] => the sentence for.
[4] => the sentence 5
)
This assumes that a new sentence starts after a dot . is followed by a space by looking at your current sample data. In case you have more complicated sample data containing dots in various forms, please post your such samples and if need be, my solution can be updated to accommodate them as well.
Let me know if this works for you preg_split("/\. (?=[A-Z])/", join(" ", $steps));
Your target array :
$steps = array (
0 => "The sentence one. The sentence two. The sentence three.",
1 => "The sentence for. The sentence 5"
);
$steps_unified = preg_split("/\. (?=[A-Z])/", join(" ", $steps));
print_r ($steps_unified);
You will get:
Array (
[0] => The sentence one
[1] => The sentence two
[2] => The sentence three
[3] => The sentence for
[4] => The sentence 5
)
If we use proper grammar, lines should end with a '.' and begin with a space and a Capital latter word.

Php splitting a sentence

I'm trying to split a string of sentences by "." to get each sentence in an array. Like below:
$Text = "Hello, Mr. James. How are you today."
$split= explode(".", $Text);
As you can see $Text contains 2 sentences therefore i should only have 2 elements in the array. The issue i'm having is that sometimes my $Text can contain words like "Mr." or any other word which contains a "." in the middle of a sentence. This will result in the sentences being split from the middle and placed separately in the array like below:
Array ( [0] => Hello, Mr [1] => James [2] => How are you today [3] => )
You can avoid a lot of exception handling and general misery, if you can ensure that all English sentences are properly spaced at the end of each sentence -- 2 consecutive spaces. This can be difficult when dealing with some digitized strings because sometimes multi-spacing gets condensed to a single space.
This is what I mean:
$Text = "Hello, Mr. James. How are you today.";
$split = explode(" ", $Text);
var_export($split);
// array ( 0 => 'Hello, Mr. James.', 1 => 'How are you today.', )
Exploding on each space-space will give you a reliable result.
If you want good output, you'll need to use good input.
If you want to blacklist a few predictable substrings that should not be use to split the string, then you can use (*SKIP)(*FAIL) for that.
Code: (Demo)
$text = "Hello, Mr. James. How are you today.";
var_export(
preg_split('~(?:Mrs?|Miss|Ms|Prof|Rev|Col|Dr)[.?!:](*SKIP)(*F)|[.?!:]+\K\s+~', $text, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'Hello, Mr. James.',
1 => 'How are you today.',
)

How to extract substrings with delimiters from a string in php

I would like to remove substrings from a string that have delimiters.
Example:
$string = "Hi, I want to buy an [apple] and a [banana].";
How do I get "apple" and "banana" out of this string and in an array? And the other parts of the string "Hi, I want to buy an" and "and a" in another array.
I apologize if this question has already been answered. I searched this site and couldn't find anything that would help me. Every situation was just a little different.
You could use preg_split() thus:
<?php
$pattern = '/[\[\]]/'; // Split on either [ or ]
$string = "Hi, I want to buy an [apple] and a [banana].";
echo print_r(preg_split($pattern, $string), true);
which outputs:
Array
(
[0] => Hi, I want to buy an
[1] => apple
[2] => and a
[3] => banana
[4] => .
)
You can trim the whitespace if you like and/or ignore the final fullstop.
preg_match_all('(?<=\[)([a-z])*(?=\])', $string, $matches);
Should do what you want. $matches will be an array with each match.
I assume you want words as values in the array:
$words = explode(' ', $string);
$result = preg_grep('/\[[^\]]+\]/', $words);
$others = array_diff($words, $result);
Create an array of words using explode() on a space
Use a regex to find [somethings] using preg_grep()
Find the difference of all words and [somethings] using array_diff(), which will be the "other" parts of the string

Regular Expressions: get what is outside of the brackets

I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!
You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]
$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)
Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone
As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex

Categories