I have a string like this:
2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;4534453,23,23,44,433;
3,23,44,433;23,23,44,433;23,23,44,433
7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433
As you see, there are semicolons between values. I want to split this string based on 'only semicolons before 7 digits values' so I should have this:
>2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433
>4534453,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;
>7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433
the only thing that I can think of is explode(';',$string) but this returns this:
>2234323,23,23,44,433;
>3,23,44,433;
>23,23,44,433;
>23,23,44,433
>4534453,23,23,44,433;
>3,23,44,433;
>23,23,44,433;23,23,44,433;
>7545455,23,23,44,433;
>3,23,44,433;23,23,44,433;
>23,23,44,433
Is there any fast method to split string with this format based on the ";" before 7 digits values?
You can use preg_split for that:
$s = '2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;4534453,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433';
var_dump(preg_split('/(;\d{7},)/', $s, -1, PREG_SPLIT_DELIM_CAPTURE));
Your output will be
array(5) {
[0] =>
string(58) "2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433"
[1] =>
string(9) ";4534453,"
[2] =>
string(50) "23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433"
[3] =>
string(9) ";7545455,"
[4] =>
string(50) "23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433"
}
I think that the next thing (combine the 1st and 2nd and then 3rd and 4th elements) is not a big deal :)
Let me know if you still here problems here.
You could do a find and replace on numbers that are seven digits long, to insert a token that you can use to split. The output may need a little extra filtering to get to your desired format.
<?php
$in =<<<IN
2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;4534453,23,23,44,433;
3,23,44,433;23,23,44,433;23,23,44,433
7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433
IN;
$out = preg_replace('/([0-9]{7})/', "#$1", $in);
$out = explode('#', $out);
$out = array_filter($out);
var_export($out);
Output:
array (
1 => '2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;',
2 => '4534453,23,23,44,433;
3,23,44,433;23,23,44,433;23,23,44,433
',
3 => '7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433',
)
Your input structure seems a little unstable, but once it is stabilized, just use preg_split() to match (and consume) semicolons that are immediately followed by exactly 7 digits. \b is a word boundary to ensure that their is no 8th digit.
Code: (Demo)
$string = <<<STR
2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433;4534453,23,23,44,433;
3,23,44,433;23,23,44,433;23,23,44,433
7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433
STR;
$string = preg_replace('/;?\R/', ';', $string); // I don't know if this is actually necessary for your real project
var_export(
preg_split('/;(?=\d{7}\b)/', $string)
);
Output:
array (
0 => '2234323,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433',
1 => '4534453,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433',
2 => '7545455,23,23,44,433;3,23,44,433;23,23,44,433;23,23,44,433',
)
Related
I'm looking for a way to explode a string. For example, I have the following string: (we don't count the beginning - 0x)
0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
which is actually an ETH transaction input. I need to explode this string into 3 parts. Imagine 1 bunch of zeros is actually a single space and these spaces define the gates where the string should be exploded.
How can I do that?
preg_split()
This function uses a regular expression to split a string.
So in this example at two or more 0 in a row:
$arr = preg_split('/[0]{2,}/', $string);
print_r($arr);
echo PHP_EOL;
This will output the following:
Array
(
[0] => a9059xbb
[1] => fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d
[2] => 54368
)
Be aware that you will have problems if a message itself has a 00 in it. Assuming it is used as a null-byte for "end of string", this will not happen, though.
preg_match()
This is an example using regular expressions. You can split at arbitrary points.
$string = 'a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368';
print_r($string);
echo PHP_EOL;
$res = preg_match('/(.{4})(.{32})(.{32})/', $string, $matches);
print_r($matches);
echo PHP_EOL;
This outputs:
a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199a
[1] => a905
[2] => 9xbb000000000000000000000000fc7a
[3] => 5f48a1a1b3f48e7dcb1f23a1ea24199a
)
As you can see /(.{4})(.{32})(.{32})/ will find 4 bytes, then 32 and after that 32 again. Capturing groups are made with () around what you want to find. They appear in the $matches array (0 is always the whole string found).
In case you want to ignore certain parts you can express that as well:
/(.{4})9x(.{32}).{4}(.{32})/
This changes the found string:
Array
(
[0] => a9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d000
[1] => a905
[2] => bb000000000000000000000000fc7a5f
[3] => a1b3f48e7dcb1f23a1ea24199af4d000
)
Links
PHP documentation for the mentioned functions:
https://www.php.net/manual/en/function.preg-split.php
https://www.php.net/manual/en/book.pcre.php
Play around with the second regular expression using this demo:
https://regex101.com/r/pfZtH8/1
If you will always explode them at the same points (4 bytes(8 hexadecimal digits), 32 bytes(64 hexadecimal digits), 32 bytes(64 hexadecimal digits)), you could use substr().
$input = "0xa9059xbb000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d00000000000000000000000000000000000000000000000000000000000054368";
$first = substr($input,2,8);
$second = substr($input,10,64);
$third = substr($input,74,64);
print_r($first);
print "<br>";
print_r($second);
print "<br>";
print_r($third);
print "<br>";
this outputs:
a9059xbb
000000000000000000000000fc7a5f48a1a1b3f48e7dcb1f23a1ea24199af4d0
0000000000000000000000000000000000000000000000000000000000054368
I'm writing a PHP function to extract numeric ids from a string like:
$test = '123_123_Foo'
At first I took two different approaches, one with preg_match_all():
$test2 = '123_1256_Foo';
preg_match_all('/[0-9]{1,}/', $test2, $matches);
print_r($matches[0]); // Result: 'Array ( [0] => 123 [1] => 1256 )'
and other with preg_replace() and explode():
$test = preg_replace('/[^0-9_]/', '', $test);
$output = array_filter(explode('_', $test));
print_r($output); // Results: 'Array ( [0] => 123 [1] => 1256 )'
Any of them works well as long as the string does not content mixed letters and numbers like:
$test2 = '123_123_234_Foo2'
The evident result is Array ( [0] => 123 [1] => 1256 [2] => 2 )
So I wrote another regex to get rid off of mixed strings:
$test2 = preg_replace('/([a-zA-Z]{1,}[0-9]{1,}[a-zA-Z]{1,})|([0-9]{1,}[a-zA-Z]{1,}[0-9]{1,})|([a-zA-Z]{1,}[0-9]{1,})|([0-9]{1,}[a-zA-Z]{1,})|[^0-9_]/', '', $test2);
$output = array_filter(explode('_', $test2));
print_r($output); // Results: 'Array ( [0] => 123 [1] => 1256 )'
The problem is evident too, more complicated paterns like Foo2foo12foo1 would pass the filter. And here's where I got a bit stuck.
Recap:
Extract a variable ammount of chunks of numbers from string.
The string contains at least 1 number, and may contain other numbers
and letters separated by underscores.
Only numbers not preceded or followed by letters must be extracted.
Only the numbers in the first half of the string matter.
Since only the first half is needed I decided to split in the first occurrence of letter or mixed number-letter with preg_split():
$test2 = '123_123_234_1Foo2'
$output = preg_split('/([0-9]{1,}[a-zA-Z]{1,})|[^0-9_]/', $test, 2);
preg_match_all('/[0-9]{1,}/', $output[0], $matches);
print_r($matches[0]); // Results: 'Array ( [0] => 123 [1] => 123 [2] => 234 )'
The point of my question is if is there a simpler, safer or more efficient way to achieve this result.
If I understand your question correctly, you want to split an underscore-delimited string, and filter out any substrings that are not numeric. If so, this can be achieved without regex, with explode(), array_filter() and ctype_digit(); e.g:
<?php
$str = '123_123_234_1Foo2';
$digits = array_filter(explode('_', $str), function ($substr) {
return ctype_digit($substr);
});
print_r($digits);
This yields:
Array
(
[0] => 123
[1] => 123
[2] => 234
)
Note that ctype_digit():
Checks if all of the characters in the provided string are numerical.
So $digits is still an array of strings, albeit numeric.
Hope this helps :)
Getting just the numeric part of the string after the explode
$test2 = "123_123_234_1Foo2";
$digits = array_filter(explode('_', $test2 ), 'is_numeric');
var_dump($digits);
Result
array(3) { [0]=> string(3) "123" [1]=> string(3) "123" [2]=> string(3) "234" }
Use strtok
Regex isn't a magic bullet, and there are FAR simpler fixes for your problem, especially considering you're trying to split on a delimiter.
Any of the following approaches would be cleaner, and more maintainable, and the strtok() approach would probably perform better:
Use explode to create and loop through an array, checking each value.
Use preg_split to do the same, but with more a adaptable approach.
Use strtok, as it is designed exactly for this use-case.
Basic exmple for your case:
function strGetInts(string $str, str $delim) {
$word = strtok($str, $delim);
while (false !== $word) {
if (is_integer($word) {
yield (int) $word;
}
$word = strtok($delim);
}
}
$test2 = '123_1256_Foo';
foreach(strGetInts($test2, '_-') as $key {
print_r($key);
}
Note: the second argument to strtok is string containing ANY delimiter to split the string on. Thus, my example will group results into strings separated by underscores or dashes.
Additional Note: If and only if the string only needs to be split on a single delimiter (underscore only), a method using explode will likely result in better performance. For such a solution, see the other answer in this thread: https://stackoverflow.com/a/46937452/1589379 .
I've been wrapping my head around this for days now, but nothing seems to give the desired result.
Example:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
Desired result:
array(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Bound-Word
)
I was able to get this all working using preg_match_all, but then the "Dash-Bound-Word" was broken up as well. Trying to match it with surrounding spaces didn't work as it would break all the words except the dash bound ones.
The preg_match_all statement I used (which broke up the dash bound words too) is this:
preg_match_all('#\(.*?\)|\[.*?\]|[^?!\-|\(|\[]+#', $var, $array);
I'm certainly no expert on preg_match, preg_split so any help here would be greatly appreciated.
You can use a simple preg_match_all:
\w+(?:[- ]\w+)*
See demo
\w+ - 1 or more alphanumeric or underscore
(?:[- ]\w+)* - 0 or more sequences of...
[- ] - a hyphen or space (you may change space to \s to match any whitespace)
\w+ - 1 or more alphanumeric or underscore
IDEONE demo:
$re = '/\w+(?:[- ]\w+)*/';
$str = "Some Words - Other Words (More Words) Dash-Binded-Word";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Result:
Array
(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Binded-Word
)
You can split by:
/\s*(?<!\w(?=.\w))[\-[\]()]\s*/
Explanation:
The match is attempted against the character class [\-[\]()] (matches any of those characters). You could also add any char you want to that character class.
It's using a negative lookbehind (?<!\w) for the condition: "not preceded by a word character".
And it also has a nested lookahead (?=.\w) that checks for: "if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character".
\s* at the beggining and the end is to trim whitespaces.
Code:
$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);
Output:
array(4) {
[0]=>
string(10) "Some Words"
[1]=>
string(11) "Other Words"
[2]=>
string(10) "More Words"
[3]=>
string(16) "Dash-Binded-Word"
}
Run this code here
Capturing parens
As stated in another comment, if you want to also capture parentheses:
$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);
Modifying the input string to suit any particular exploding technique would be indirect and indicate that a suboptimal exploding technique is being used.
The truth is, your required logic can be boiled down to: "explode on each sequence of non-word characters that have a length of 2 or more". This is what that pattern looks like with preg_split().
Code: (Demo)
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
var_export(preg_split('~\W{2,}~', $var));
Output:
array (
0 => 'Some Words',
1 => 'Other Words',
2 => 'More Words',
3 => 'Dash-Binded-Word',
)
It doesn't get any simpler than that.
Try this (combination of str_replace and explode). It is not optimum but may work for this case:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);
Output:
array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other
Words" [2]=> string(10) "More Words" [3]=> string(16)
"Dash-Binded-Word" }
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var))); //replaces all multiple spaces with one and explode creates array split where there is space
OUTPUT :-
Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )
I have a little problem i need to sort. I want to either remove a part of a string or split it.
So basically i have this: One-1, Two-2, Three-3
What I want to end up doing is splitting into 2 variables where i have "One, Two, Three" and "1, 2, 3" , Im not sure if i can split it into two or if i have to remove the part after "-" first then do it again to remove the bit before "-" to end up with two variables. Anyways I have had a look and seems that preg_split or preg_match may work, but have no idea about preg patterns.
This is what i have so far :
$string = 'One-1, Two-2, Three-3';
$pattern = '????????????';
preg_match_all($pattern, $string, $matches);
print_r($matches);
EDIT: Sorry my Question was worded wrong:
Basically if someone could help me with the preg pattern to either split the Values so I have an array of One, Two Three and 1, 2, 3
Any guidance appreciated
Ian
----------------EDIT--------------
I have another question if I can, how would the preg_match change if I had this
:
"One Object-1, Two Object-2" So that now I have more than one word before the "-" which want to be stored together and the "1" on its own?
Try this :
$string = 'One-1, Two-2, Three-3';
$pattern = '/(?P<first>\w+)-(?P<second>\w+)/';
preg_match_all($pattern, $string, $matches);
print_r($matches['first']);
print_r($matches['second']);
Output:
Array ( [0] => One [1] => Two [2] => Three )
Array ( [0] => 1 [1] => 2 [2] => 3 )
If you always have a "-" use this instead:
$string = "One-1";
$args = explode("-", $string);
// $args[0] would have One
// $args[1] would have 1
You can read more about this function here: http://php.net/manual/en/function.explode.php
This should work:
[^-]+
But why use regexp at all? You can simply explode by "-"
I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!
You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]
$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)
Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone
As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex