How to break string through regex in PHP - php

String:
"hello, how are you? you are fine. i am good too."
I want to break this string on the basis of , ? . these characters in php through reg-ex or you can provide me simple reg-ex.
Desired result in an array:
[0]hello
[1]how are you
[2]you are fine
[3]i am good too
Please provide regex for this. again the question is that provide the regex through which i can break the string. Matching characters should be . , ?

You can use preg_split with this regex:
/[^A-Za-z\s]\s*/
It looks for a character which is not a letter or whitespace, followed optionally by one or more spaces. This allows for the situation where there is no space after the punctuation mark. Note we use the PREG_SPLIT_NO_EMPTY flag to preg_split so that if the string ends in a punctuation mark we don't get empty strings in the output.
$string = "hello, how are you?you are fine. i am good too.";
$output = preg_split('/[^A-Za-z\s]\s*/', $string, -1, PREG_SPLIT_NO_EMPTY);
print_r($output);
Output:
Array (
[0] => hello
[1] => how are you
[2] => you are fine
[3] => i am good too
)
Demo on 3v4l.org

You can just use the preg_split method with a regex that matches any non-word character followed by a space and a non-word character at the end of the text.
The if is used to pop the ending empty string present in the array if your input text does contain a punctuation character at the very end of it.
$input = "hello, how are you? you are fine. i am good too.";
$output = preg_split( "/\W(?:\s|$)/", $input );
if(strlen(end($output))==0)
{
array_pop($output);
}
foreach ($output as $item) {
echo $item;
echo "\n";
}
output:
hello
how are you
you are fine
i am good too

Related

explode string at "newline,space,newline,space" in PHP

This is the string I'm trying to explode. This string is part of a paragraph which i need to split at every "newline,space,newline,space" :
s
1
A result from textmagic.com show it contains a \n then a space then a \n and then a space.
This is what I tried:
$values = explode("\n\s\n\s",$string); // 1
$values = explode("\n \n ",$string); // 2
$values = explode("\n\r\n\r",$string); // 3
Desired output:
Array (
[0] => s
[1] => 1
)
but none of them worked. What's wrong here?
How do I do it?
Just use explode() with PHP_EOL." ".PHP_EOL." ", which is of the format "newline, space, newline, space". Using PHP_EOL, you get the correct newline-format for your system.
$split = explode(PHP_EOL." ".PHP_EOL." ", $string);
print_r($split);
Live demo at https://3v4l.org/WpYrJ
Using preg_split() to explode() by multiple delimiters in PHP
Just a quick note here. To explode() a string using multiple delimiters in PHP you will have to make use of the regular expressions. Use pipe character to separate your delimiters.
$string = "\n\ranystring"
$chunks = preg_split('/(de1|del2|del3)/',$string,-1, PREG_SPLIT_NO_EMPTY);
// Print_r to check response output.
echo '<pre>';
print_r($chunks);
echo '</pre>';
PREG_SPLIT_NO_EMPTY – To return only non-empty pieces.

preg_split in php to split string at the point which is preceded by digit and is followed by letters or blank space

In order to split my string at the point which is preceded by digit and is followed by letters as:
$str = '12jan';
I have used
$arr = preg_split('/(?<=[0-9])(?=[a-z]+)/i',$str);
It works file and gives the desired output. I want to update it so that it gives the same output for strings like.
$str='12 jan';
$str='12 jan';
$str='12/jan';
$str='12//jan';
$str='12/jan';
$str='12*/jan';
$str='12*//jan';
The code should work for any strings given above so that at the end of the day I have a array like
Array
(
[0] => 12
[1] => jan
)
Any help will be appreciated.
This may be optimized if you answer my question in the comment.
Pattern: ~(?<=[0-9])[*/ ]*(?=[a-z]+)~i
Demo
The above will match zero or more *, / and/or space characters.
On your input strings, this will be just as accurate and faster:
Pattern: ~\d+\K[^a-z]*~i
or: ~\d+\K[*/ ]*~ (no case-sensitive pattern modifier is necessary)
Demo
The above will match zero or more non-alphabetical characters immediately following the leading digit(s).
And of course preg_split's cousins can also do nicely:
Here is a battery of PHP Demos.
$strings=['12jan','12 jan','12 jan','12/jan','12//jan','12/jan','12*/jan','12*//jan'];
foreach($strings as $string){
var_export(preg_split('~(?<=[0-9])[*/ ]*(?=[a-z]+)~i',$string));
echo "\n";
var_export(preg_split('~\d+\K[*/ ]*~',$string));
echo "\n";
var_export(preg_match('~(\d+)[/* ]*([a-z]+)~i',$string,$out)?array_slice($out,1):'fail');
echo "\n";
var_export(preg_match('~(\d+)[/* ]*(.+)~',$string,$out)?array_slice($out,1):'fail');
echo "\n";
var_export(preg_match_all('~\d+|[a-z]+~i',$string,$out)?$out[0]:'fail');
echo "\n---\n";
}
All methods provide the same output.
A simple preg_match regexp does it:
foreach (['12 jan', '12 jan', '12/jan', '12//jan', '12/jan',
'12*/jan', '12*//jan'] as $test)
{
unset ($matches);
if (preg_match("#^([0-9]+)[ /*]*(.*)#", $test, $matches)) {
var_export( [$matches[1], $matches[2]] );
}
else {
print "Failed for '$test'.\n";
}
}
The regexp is:
start with numbers -> group #1
have 0 or more of space, slash or stars
take all the rest -> group #2
I have updated your code with preg_match
Its gives output what exactly your needs
$str='12/jan';
preg_match('/^(\d*)[\*\s\/]*([a-z]{3})$/i', $str, $match);
print_r($match);
but the output is changed a little bit, It will be like below
array(
0 => '12/jan',
1 => '12',
2 => 'jan'
)

How can I extract or preg_replace chinese characters in a string?

I am currently have a list of string like this
蘋果,香蕉,橙。
榴蓮, 啤梨
鳳爪,排骨,雞排
24個男,2個女,30個老人
What I want to do is just explode all chinese and alphanumeric character from these strings.
How can I replace all special characters like , , 。 / " and spaces with - or _
then extract all chinese character with explode() like $str = explode("-",$str); or $str = explode("_",$str); ?
I am currently have a RegEx like this
if(/^\S[\u0391-\uFFE5 \w]+\S$/.test(value)).....
And I modified it into
$str = preg_replace("/^\S[\x{0391}-\x{FFE5} \w]+\s+\S$/u", "-", $str);
but it seems it didn't work...
the online exampls: https://www.regex101.com/r/qR8aA6/1
EDIT : my expected output(for the first sting):
firstly it should be replaced into
蘋果-香蕉-橙- or 蘋果_香蕉_橙_
then I can use $str = explode("-",$str); to make them finally become:
Array
(
[0] => 蘋果
[1] => 香蕉
[2] => 橙
)
Seems like you want something like this,
$txt = <<<EOT
蘋果,香蕉,橙。
榴蓮, 啤梨
鳳爪,排骨,雞排
24個男,2個女,30個老人
EOT;
echo preg_replace('~[^\p{L}\p{N}\n]+~u', '-', $txt);
Output:
蘋果-香蕉-橙-
榴蓮-啤梨
鳳爪-排骨-雞排
24個男-2個女-30個老人
DEMO
Explanation:
\p{L} Matches any kind of letter from any language.
\p{N} matches any kind of numeric character in any script.
\n Matches a newline character.
By putting all inside a negated character class will do the opposite operation.

Php regexp for escaping characters

I have a string that the user may split manually using comma's.
For example, the string value1,value2,value3 should result in the array:
["value1", "value2", "value3"]
Now what if the user wishes to allow a comma as a substring? I would like to solve that problem by letting the user escape a comma using two comma's or a backslash. For example, the string
"Hi, Stackoverflow" would be written as "Hi,, Stackoverflow" or "Hi\, Stackoverflow".
I find it difficult to evaluate such a string however. I have attempted preg splitting, but there is no way to see if a lookbehind or lookahead series of characters consists of an even or odd number. Furthermore, backslashes and double comma's meant for escaping must be removed as well, which probably requires an additional replace function.
$text = 'Hello, World \,asdas, 123';
$data = preg_split('/(?<=[^\\\]),/',$text);
print_r($data);
Result
Array ( [0] => Hello [1] => World \,asdas [2] => 123 )
For this I would run preg_replace_callback which allows you to count escape characters used and determine what to do with them. If it turns out that coma is not escaped, replace it to some non-printable character that should not be used by user in his input and then explode by this character:
<?php
$str = "One,Two\\, Two\\\\,Three";
$delimiter = chr(0x0B); // vertical tab, hope you do not expect it in the input?
$escaped = preg_replace_callback('/(\\\\)*,?/', function($m) use($delimiter){
if(!isset($m[1]) || strlen($m[0])%2) {
return str_replace(',',$delimiter,preg_replace('/\\\\{2}/','\\',$m[0]));
} else {
return str_replace('\\,',',', preg_replace('/\\\\{2}/','\\',$m[0]));
}
}, $str);
$array = explode($delimiter, $escaped);

Explode a paragraph into sentences in PHP

I have been using
explode(".",$mystring)
to split a paragraph into sentences. However this doen't cover sentences that have been concluded with different punctuation such as ! ? : ;
Is there a way of using an array as a delimiter instead of a single character? Alternativly is there another neat way of splitting using various punctuation?
I tried
explode(("." || "?" || "!"),$mystring)
hopefully but it didn't work...
You can use preg_split() combined with a PCRE lookahead condition to split the string after each occurance of ., ;, :, ?, !, .. while keeping the actual punctuation intact:
Code:
$subject = 'abc sdfs. def ghi; this is an.email#addre.ss! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => def ghi;
[2] => this is an.email#addre.ss!
[3] => asdasdasd?
[4] => abc xyz
)
You can also add a blacklist for abbreviations (Mr., Mrs., Dr., ..) that should not be split into own sentences by inserting a negative lookbehind assertion:
$subject = 'abc sdfs. Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => Dr. Foo said he is not a sentence;
[2] => asdasdasd?
[3] => abc xyz
)
You can do:
preg_split('/\.|\?|!/',$mystring);
or (simpler):
preg_split('/[.?!]/',$mystring);
Assuming that you actually want the punctuations marks with the end result, have you tried:
$mystring = str_replace("?","?---",str_replace(".",".---",str_replace("!","!---",$mystring)));
$tmp = explode("---",$mystring);
Which would leave your punctuation marks in tact.
preg_split('/\s+|[.?!]/',$string);
A possible problem might be if there is an email address as it could split it onto a new line half way through.
Use preg_split and give it a regex like [\.|\?!] to split on
You can't have multiple delimiters for explode. That's what preg_split(); is for. But even then, it explodes at the delimiter, so you will get sentences returned without the punctuation marks.
You can take preg_split a step farther and flag it to return them in their own elements with PREG_SPLIT_DELIM_CAPTURE and then run some loop to implode sentence and following punctation mark in the returned array, or just use preg_match_all();:
preg_match_all('~.*?[?.!]~s', $string, $sentences);
$mylist = preg_split("/[.?!:;]/", $mystring);
You can try preg_split
$sentences = preg_split("/[.?!:;]+/", $mystring);
Please note this will remove the punctuations. If you would like to strip out leading or trailing whitespace as well
$sentences = preg_split("/[.?!:;]+\s+?/", $mystring);

Categories