Splitting at string and keeping periods - php

I'm trying to split a sentence at the .!? while keeping them, but for some reason it's not working correctly. What am I doing wrong?
$input = "hi i am1. hi i am2.";
$inputX = preg_split("~[.!?]+\K\b~", $input);
print_r($inputX);
Result:
Array ( [0] => hi i am1. hi i am2. )
Expected Result:
Array ( [0] => hi i am1. [1] => hi i am2. )

I am not sure if you need to do a preg_split() but try preg_match_all() if that is an option:
$input = "hi i am1. hi i am2.";
preg_match_all("/[^\.\?\!]+[\.\!\?]/", $input,$matched);
print_r($matched);
Gives you:
Array
(
[0] => Array
(
[0] => hi i am1.
[1] => hi i am2.
)
)

Try without \b, I think it is redundant (if it is not a case) here.
$input = "hi i am1. hi i am2.?! hi i am2.?";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
The (?!$) is to avoid splitting on matched element, if it is on the end of string, so there will not be an additional empty result. The atomic grouping ?> is to avoid spliting if there is series of characters on the end of string, like ?!.(without atomic grouping it would split on !, and last result would be single char .). Output:
Array
(
[0] => hi i am1.
[1] => hi i am2.?!
[2] => hi i am2.?
)

i hope this is what you are expecting
$input = "hi i am1. hi i !am?2."; // i have added other ?! symbols also
$inputX = preg_split("/(\.|\!|\?)/", $input,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($inputX)
output:
Array ( [0] => hi i am1 [1] => . [2] => hi i [3] => ! [4] => am [5] => ? [6] => 2 [7] => . [8] => )

Related

How to extract certain words from a php string?

I have a long string like this I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];. Now I just want to get certain words like 'I1','I2','I8','NA1' and so on i.e. words between ':'&';' only ,and store them in array. How to do that efficiently?
I have already tried using preg_split() and it works but giving me wrong output. As shown below.
// $a is the string I want to extract words from
$str = preg_split("/[;:]/", $a);
print_r($str);
The output I am getting is this
Array
(
[0] => I8
[1] => 2
[2] => I1
[3] => 1
[4] => I2
[5] => 2
[6] => I3
[7] => 2
[8] => I4
[9] => 4
[10] =>
)
Array
(
[0] => NA1
[1] => 5
[2] =>
)
Array
(
[0] => IA1
[1] => [1,2,3,4,5]
[2] =>
)
Array
(
[0] => S1
[1] => asadada
[2] =>
)
Array
(
[0] => SA1
[1] => [1,2,3,4,5]
[2] =>
)
But I am expecting 'I8','I1','I2','I3','I4' also in seperated array with position [0]. Any help on how to do this.
You could try something like.
<?php
$str = 'I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];';
preg_match_all('/(?:^|[;:])(\w+)/', $str, $result);
print_r($result[1]); // Matches are here in $result[1]
You can perform a greedy match to match the items between ; and : using preg_match_all()
<?php
$str = 'I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];';
preg_match_all('/;(.+?)\:/',$str,$matches);
print_r($matches[1]);
Live Demo: https://3v4l.org/eBsod
One possible approach is using a combination of explode() and implode(). The result is returned as a string, but you can easily put it into an array for example.
<?php
$input = "I1:1;I2:2;I8:2;NA1:5;IA1:[1,2,3,4,5];S1:asadada;SA1:[1,2,3,4,5];SA1:[1,2,3,4,5];.";
$output = array();
$array = explode(";", $input);
foreach($array as $item) {
$output[] = explode(":", $item)[0];
}
echo implode(",", $output);
?>
Output:
I1,I2,I8,NA1,IA1,S1,SA1,SA1,.

Break a string with optional space and number and a dot

trying to break a string from (optional space) number and a dot.
$string = "1.1Kumar/Sandeep MR*T0148.4801 12.23Pal/Sandeep MR*T643.948";
$regex1 = "/(\s*[0-9]+\.)/";
$regex2 = "/(?<=\s)[0-9]+\./";
I need to break from 1. and 12. .
The first regex gives:
Array
(
[0] =>
[1] => 1Kumar/Sandeep MR*T
[2] => 4801
[3] => 23Pal/Sandeep MR*T
[4] => 948
)
The second regex gives:
Array
(
[0] => 1.1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)
I am trying to get:
Array
(
[0] => 1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)
For you example string this will work:
\b\d+\.
Debuggex Demo
It makes sure there's a word break before the numeric part. (start of line or a space does it)

Need to match ALL similar words/phrases using preg_match_all

I'm trying to create a pattern that matches all similar words/phrases within a string.
For example, I need to match: "this", "this is", "this is it", "that", "that was", "that was not".
It only matches the first occurence of "this", but it should match all occurences.
I even tried anchors and word boundaries, but nothing seems to work.
I tried (simplified):
$content = "this is it! that was not!";
preg_match_all('/(this|this is|this is it|that|that was|that was not)/i', $content, $results);
Which should output:
this
this is
this is it
that
that was
that was not
Given that you're only capturing the terms you're searching for, it might be better to simply use a foreach loop as well as substr_count to see how many times each string occurs.
For example:
$haystack = "this is it! that was not! this is not a test!";
$needles = array(
"this",
"this is",
"this is it",
"that",
"that was",
"that was not");
foreach ($needles as $needle) {
// substr_count is case sensitive, so make subject and search lowercase
$hits = substr_count(strtolower($haystack), strtolower($needle));
echo "Search '$needle' occurs $hits time(s)" . PHP_EOL;
}
The above will output:
Search 'this' occurs 2 time(s)
Search 'this is' occurs 2 time(s)
Search 'this is it' occurs 1 time(s)
Search 'that' occurs 1 time(s)
Search 'that was' occurs 1 time(s)
Search 'that was not' occurs 1 time(s)
If substr_count doesn't provide the flexibility that you need then you can always replace it with a preg_match_all and use your individual $needle values as search terms.
The problem is that the shortest string option appears first in your or group:
/(this|this is|this is it)/i
PHP will check if the test string contains a item of (this|this is|this is it) from left to right. Once it found a match in the test string it will leave the group.
This will work because PHP will search for the longest string first:
/(this is it|this is|this)/i
Demo
How about:
$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))/i', $content, $results);
print_r($results);
Edit according to comments:
$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);
Output:
Array
(
[0] => Array
(
[0] =>
[1] =>
)
[1] => Array
(
[0] => this
[1] =>
)
[2] => Array
(
[0] => this is
[1] =>
)
[3] => Array
(
[0] => this is it
[1] =>
)
[4] => Array
(
[0] =>
[1] => that
)
[5] => Array
(
[0] =>
[1] => that was
)
[6] => Array
(
[0] =>
[1] => that was not
)
)
More universal:
$content = "this is it! that was not!";
preg_match_all('/\b(?=(\w+))(?=(\w+ \w+))(?=(\w+ \w+ \w+))\b/i', $content, $results);
print_r($results);
output:
Array
(
[0] => Array
(
[0] =>
[1] =>
)
[1] => Array
(
[0] => this
[1] => that
)
[2] => Array
(
[0] => this is
[1] => that was
)
[3] => Array
(
[0] => this is it
[1] => that was not
)
)
You can also use the following regex instead.
/(this(?:\sis(?:\sit)?)?)/i

Preg_match_all behaving wierd

I am new to PHP and I have the below code and I basically wish to find all keywords enclosed between
'<#' and '#>'
sample code:
<?php
$subject = "askdbvbaldjbvasdblasdbvl<#2134#>cbkdbskbkabdvb<#213aca4#>";
$pattern = "/(?<=\<\#)(.*?)(?=\#\>)/";
preg_match_all($pattern, $subject, $matches);
echo '<pre>',print_r($matches,true),'</pre>';
?>
now i am expecting a value array like:
Array
(
[0] => Array
(
[0] => 2134
[1] => 213aca4
)
)
But i am getting and output like:
Array
(
[0] => Array
(
[0] => 2134
[1] => 213aca4
)
[1] => Array
(
[0] => 2134
[1] => 213aca4
)
)
can any one tell me why am i getting the second array and how can i get rid of that..
The second array contains the sub-match, or matched group, because you're using a capture group.
Simply remove the parens in your regex:
$pattern = "/(?<=\<\#).*?(?=\#\>)/";
Also, you should be able to use this regex without some escapes:
$pattern = "/(?<=<#).*?(?=#>)/";

Explode array three times

I have a string and I would like to explode with three differents patterns. The string looks like to :
country:00/00/00->link:00/00/00->link2
country2:00/00/00->link3:00/00/00->link4
I would like to get the differents parts of this two strings. The two lines are separated by a /n, the dates are separated by : and the link associated to date are separated with a ->
At the beginning I explode by the line break
$var = explode("\n", $var);
but when I tried to explode again this string, I get an error : *preg_split() expects parameter 2 to be string, array given*
How can I get the different parts ?
Thanks in advance.
Ideone link
Instead of using preg_split, consider using preg_match. You can write it as one big regex.
<?php
// Implicit newline. Adding \n would make an empty spot in the array
$str = "country:00/00/00->link:00/00/00->link2
country2:00/00/00->link3:00/00/00->link4";
$arr = split("\n", $str);
for ($i = 0; $i < count($arr); $i++) {
preg_match("/^(\w+)\:(\d\d\/\d\d\/\d\d)->(\w+)\:(\d\d\/\d\d\/\d\d)->(\w+)/", $arr[$i], $matches);
print_r($matches);
}
?>
Output:
Array
(
[0] => country:00/00/00->link:00/00/00->link2
[1] => country
[2] => 00/00/00
[3] => link
[4] => 00/00/00
[5] => link2
)
Array
(
[0] => country2:00/00/00->link3:00/00/00->link4
[1] => country2
[2] => 00/00/00
[3] => link3
[4] => 00/00/00
[5] => link4
)
EDIT
In your comment, you're posting dates with 4 digits, whereas in your question, they only had 2 digits.
Therefore you need to change the regex to:
/^(\w+)\:(\d\d\/\d\d\/\d\d\d\d)->(\w+)\:(\d\d\/\d\d\/\d\d\d\d)->(\w+)/
How about using preg_match_all:
<?php
$data =<<<ENDDATA
country:00/00/00->link:00/00/00->link2
country2:00/00/00->link3:00/00/00->link4
ENDDATA;
preg_match_all('#(\d{2}/\d{2}/\d{2})->(.[^:\n]+)#', $data, $matches);
print_r($matches);
Gives the following result:
Array
(
[0] => Array
(
[0] => 00/00/00->link
[1] => 00/00/00->link2
[2] => 00/00/00->link3
[3] => 00/00/00->link4
)
[1] => Array
(
[0] => 00/00/00
[1] => 00/00/00
[2] => 00/00/00
[3] => 00/00/00
)
[2] => Array
(
[0] => link
[1] => link2
[2] => link3
[3] => link4
)
)
your problem is that after using explode first time, it is turning into an array and explode function connat explode an array. You need to use a loop probablr for loop that targets array elemets then use explode function on those elements and you will have it.
See example Below:
<?php
$val="abc~~~def~~~ghi####jkl~~~mno~~~pqr###stu~~~vwx~~~yz1";
$val=explode("####", $val);
//result will be
$valWillBe=array(3) {
[0]=>'abc~~~def~~~ghi',
[1]=>'jkl~~~mno~~~pqr',
[2]=>'stu~~~vwx~~~yz1'
}
//if you want to explode again you use a loop
for($r=0; $r<sizeof($val); $r++){
$val[$r]=explode("~~~", $val[$r]);
}
//now you have your string exploded all in places.
?>

Categories