Split a string just before each occurrence of 3 specific delimiters - php

I'm a bit lost with preg_split() in parsing a string with multiple delimiters and keeping the delimiter in the 'after' part of the split.
My delimiters are $, #, and ?.
For instance:
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
will show:
Array
(
[0] => participant_data-$
[1] => id#
[2] => -group
)
However I need:
Array
(
[0] => participant_data-
[1] => $id
[2] => #-group
)
Regex makes my brain hurt. so could someone advise how I use PREG_SPLIT_DELIM_CAPTURE and keep the delimiter at the beginning of the segment?

Try this:
$ar = preg_split('/(\$[^#]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

How about this. I am capturing the delimiters and then put them back together.
<?php
$str = 'participant-$id#-group';
$ar = preg_split('/([^$#?]+[^$#?]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
echo "<pre>"; print_r( $ar); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $
[2] => id
[3] => #
[4] => -group
) */
$result = array();
$result[] = $ar[0];
for($i=1;$i<count($ar);$i+=2) {
$result[] = $ar[$i] . $ar[$i+1];
}
echo "<pre>"; print_r( $result); echo "</pre>";
/*
Array
(
[0] => participant-
[1] => $id
[2] => #-group
)
*/
?>

You don't need to capture the delimiters, just use a lookahead for the $, #, or ?. This will split your string on the zero-width position before the delimiters. No characters will be lost/consumed why exploding.
Code: (Demo)
$str = 'participant-$id#-group';
var_export(
preg_split('/(?=[$#?])/', $str)
);
Output:
array (
0 => 'participant-',
1 => '$id',
2 => '#-group',
)

Related

PHP Check string contain #(any) [duplicate]

I have a string that has hash tags in it and I'm trying to pull the tags out I think i'm pretty close but getting a multi-dimensional array with the same results
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?!\b)(#\w+\b)/',$string,$matches);
print_r($matches);
which yields
Array (
[0] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
[1] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
)
I just want one array with each word beginning with a hash tag.
this can be done by the /(?<!\w)#\w+/ regx it will work
That's what preg_match_all does. You always get a multidimensional array. [0] is the complete match and [1] the first capture groups result list.
Just access $matches[1] for the desired strings. (Your dump with the depicted extraneous Array ( [0] => Array ( [0] was incorrect. You get one subarray level.)
I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
preg_match_all('/#(\w+)/',$string,$matches);
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
$i++;
if ($count > $i) $keywords .= ", ";
}
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
}
$keywords = $keyword;
}
return $keywords;
}
Try:
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?<!\w)#\S+/', $string, $matches);
print_r($matches[0]);
echo("<br><br>");
// Output: Array ( [0] => #a [1] => #some [2] => #hash )

What is the best way to split letters and numbers?

I have this variable:
$str = "w15";
What is the best way to split it to w and 15 separately?
explode() is removing 1 letter, and str_split() doesn't have the option to split the string to an unequal string.
$str = "w15xx837ee";
$letters = preg_replace('/\d/', '', $str);
$numbers = preg_replace('/[^\d]/', '', $str);
echo $letters; // outputs wxxee
echo $numbers; // outputs 15837
You could do something like this to separate strings and numbers
<?php
$str = "w15";
$strarr=str_split($str);
foreach($strarr as $val)
{
if(is_numeric($val))
{
$intarr[]=$val;
}
else
{
$stringarr[]=$val;
}
}
print_r($intarr);
print_r($stringarr);
Output:
Array
(
[0] => 1
[1] => 5
)
Array
(
[0] => w
)
If you want it to be as 15 , you could just implode the $intarr !
Use preg_split() to achieve this:
$arr = preg_split('~(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)~', $str);
Output:
Array
(
[0] => w
[1] => 15
)
For example, the string w15g12z would give the following array:
Array
(
[0] => w
[1] => 15
[2] => g
[3] => 12
[4] => z
)
Demo
Much cleaner:
$result = preg_split("/(?<=\d)(?=\D)|(?<=\D)(?=\d)/",$str);
Essentially, it's kind of like manually implementing \b with a custom set (rather than \w)

string to array, split by single and double quotes

i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:
$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => 'sat on'
[2] => the mat
)*/
$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the cat
[1] => "sat on"
[2] => the mat
)*/
$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => "cat 'sat' on"
[2] => the mat
)*/
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
)*/
as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.
the closest i have come up with for $pattern is
$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";
but obviously this is not working.
You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as #Jan Turoň's back reference approach because the required capture group messes up the results.
$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
You can use just preg_match for this:
$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';
if (preg_match($pattern, $str, $matches)) {
printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
$matches[1],
$matches[2],
$matches[4]
);
}
This prints:
[initial] => the
[quoted] => "cat 'sat' on"
[end] => the mat
Here is an explanation of the regex:
/^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
(([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
(.*)$/ => Capture until the end in \4
Yet another solution using preg_replace_callback
$result1 = array();
function parser($p) {
global $result1;
$result1[] = $p[0];
return "|"; // temporary delimiter
}
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter
Now you can zip those arrays using array_map
$result = array();
function zipper($a,$b) {
global $result;
if($a) $result[] = $a;
if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);
And the result is
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.
You can use back references and ungreedy modifier in preg_match_all
$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);
Now you have your outermost quotation parts
[0] => 'cat "sat" on'
[1] => 'when "it" was'
And you can find the rest of the string with substr and strpos (kind of blackbox solution)
$a = $b = 0; $result = array();
foreach($match[0] as $part) {
$b = strpos($str,$part);
$result[] = substr($str,$a,$b-$a);
$result[] = $part;
$a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);
Here is the result
[0] => the
[1] => 'cat "sat" on'
[2] => the mat
[3] => 'when "it" was'
[4] => seventeen
Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.

Split string between less and greater than

I need to split this kind of strings to separate the email between less and greater than < >. Im trying with the next regex and preg_split, but I does not works.
"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>
The expected result will be:
Array
(
[0] => "email1#domain.com"
[1] => email#email.com
)
Array
(
[0] => News
[1] => news#e.domain.com
)
Array
(
[0] => Some Stuff
[1] => email-noreply#somestuff.com
)
Code that I am using now:
foreach ($emails as $email)
{
$pattern = '/<(.*?)>/';
$result = preg_split($pattern, $email);
print_r($result);
}
You may use some of the flags available for preg_split: PREG_SPLIT_DELIM_CAPTURE and PREG_SPLIT_NO_EMPTY.
$emails = array('"email1#domain.com" <email1#domain.com>', 'News <news#e.domain.com>', 'Some Stuff <email-noreply#somestuff.com>');
foreach ($emails as $email)
{
$pattern = '/<(.*?)>/';
$result = preg_split($pattern, $email, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($result);
}
This outputs what you expect:
Array
(
[0] => "email1#domain.com"
[1] => email1#domain.com
)
Array
(
[0] => News
[1] => news#e.domain.com
)
Array
(
[0] => Some Stuff
[1] => email-noreply#somestuff.com
)
Splitting on something removes the delimiter (i.e. everything the regex matches). You probably want to split on
\s*<|>
instead. Or you can use preg_match with the regex
^(.*?)\s*<([^>]+)>
and use the first and second capturing groups.
This will do the job. click here for Codepad link
$header = '"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>';
$result = array();
preg_match_all('!(.*?)\s+<\s*(.*?)\s*>!', $header, $result);
$formatted = array();
for ($i=0; $i<count($result[0]); $i++) {
$formatted[] = array(
'name' => $result[1][$i],
'email' => $result[2][$i],
);
}
print_r($formatted);
preg_match_all("/<(.*?)>/", $string, $result_array);
print_r($result_array);
$email='"email1#domain.com" <email1#domain.com>
News <news#e.domain.com>
Some Stuff <email-noreply#somestuff.com>';
$pattern = '![^\>\<]+!';
preg_match_all($pattern, $email,$match);
print_r($match);
Ouput:
Array ( [0] => Array (
[0] => "email1#domain.com"
[1] => email1#domain.com
[2] => News
[3] => news#e.domain.com
[4] => Some Stuff
[5] => email-noreply#somestuff.com ) )
You can also split by <, and get rid of ">" in $result
$pattern = '/</';
$result = preg_split($pattern, $email);
$result = preg_replace("/>/", "", $result);

Why is there an extra empty row when splited by multibyte punctuation?

Try this:
$pattern = '/[\x{ff0c},]/u';
//$string = "something here ; and there, oh,that's all!";
$string = 'hei,nihao,a ';
echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';
exit();
output:
<pre>Array
(
[0] => hei,nihao,a
)
</pre>
The character you have is a fullwidth comma ( hex ff0c ), as well as a regular comma. Have you tried updating it to my version which accounts for it?
<?php
$pattern = '/[\x{ff0c},]/u';
//$string = "something here ; and there, oh,that's all!";
$string = 'hei,nihao,a ';
echo '<pre>', print_r( preg_split( $pattern, $string ), 1 ), '</pre>';
Output:
Array
(
[0] => hei
[1] => nihao
[2] => a
)

Categories