Keeping the punctuation marks after using preg_split at? - php

I'm trying to split a string at question marks, exclamation marks, or periods, but at the same time I'm trying to keep the punctuation marks after splitting them. How would I do that? Thanks.
$input = "Sentence1?Sentence2.Sentence3!";
$input = preg_split("/(\?|\.|!)/", $input);
echo $input[0]."<br>";
echo $input[1]."<br>";
echo $input[2]."<br>";
Desired outputs:
Sentence1?
Sentence2.
Sentence3!
Actual outputs:
Sentence1
Sentence2
Sentence3

You can do this by changing the capture group in your regex into a lookbehind like so:
$input = preg_split("/(?<=\?|\.|!)/", $input);

the manual knows all
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
so in your case:
$input = preg_split("/(\?|\.|!)/", $input,NULL,PREG_SPLIT_DELIM_CAPTURE);

Related

php Regex before a character and before another one

I have a list of :
firstname.lastname (location)
I'd like to extract the firstname, the lastname, and the location. It can be points in the location but it's always between parenthesis.
Can anyone help me please? (and give the explanation of the regex if possible, I dont know why I never can create my own regex...)
I found :
#\((.*?)\)# for the location
^[^\.]+ for the firstname
But I cant find for the lastname, and I dont know how to match all 3 together
You can do it without regex:
$string = 'firstname.lastname (location)';
//you get there array of name and surname
$exploded = explode('.',substr($string, 0, strpos($string, ' ')));
$name = $exploded[0];
$surname = $exploded[1];
//You get there location
$location = rtrim(explode(' (', $string)[1], ')');
You don't need regex for that. explode() on . with a limit of 2. Then strpos() first parenthesis ( and let substr() do the rest.
That's not too difficult with a regex. However, your confusion might stem from the fact that several of the characters in that example string are have special meanings in RegEx.
<?php
$string = "firstname.lastname (location)";
if(preg_match('/^(\w+)\.(\w+)\s*\((\w*)\)$/', $string, $aCapture)){
/*Let's break down that regex
^ Start of string
(\w+) Capture a string of continuous characters
\. a period
(\w+) Capture a string of continuous characters
\s Zero or more whitespace
\( An opening bracket
(\w+) Capture a string of continuous characters
\) An closing bracket
$ The end of the string
*/
$aCapture contains your captures; starting at position 1, because 0 will contain the entire string
$sFirstName = $aCapture[1];
$sLastName = $aCapture[2];
$sLocation = $aCapture[3];
print "$sFirstName, $sLastName, $sLocation";
}
?>
Using a formatted string:
$str = 'jacques.cheminade (espace)';
$result = sscanf($str, '%[^.].%[^ ] (%[^)])');
Note that if the syntax seems similar to the one used in regex, tokens [^...] don't use quantifiers as they describe parts of the string and not single characters.

Twitter handle regular expression PHP [duplicate]

i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username

Add + before word, see all between quotes as one word

I have a question. I need to add a + before every word and see all between quotes as one word.
A have this code
preg_replace("/\w+/", '+\0', $string);
which results in this
+test +demo "+bla +bla2"
But I need
+test +demo +"bla bla2"
Can someone help me :)
And is it possible to not add a + if there is already one? So you don't get ++test
Thanks!
Maybe you can use this regex:
$string = '+test demo between "double quotes" and between \'single quotes\' test';
$result = preg_replace('/\b(?<!\+)\w+|["|\'].+?["|\']/', '+$0', $string);
var_dump($result);
// which will result in:
string '+test +demo +between +"double quotes" +and +between +'single quotes' +test' (length=74)
I've used a 'negative lookbehind' to check for the '+'.
Regex lookahead, lookbehind and atomic groups
I can't test this but could you try it and let me know how it goes?
First the regex: choose from either, a series of letters which may or may not be preceded by a '+', or, a quotation, followed by any number of letters or spaces, which may be preceded by a '+' followed by a quotation.
I would hope this matches all your examples.
We then get all the matches of the regex in your string, store them in the variable "$matches" which is an array. We then loop through this array testing if there is a '+' as the first character. If there is, do nothing, otherwise add one.
We then implode the array into a string, separating the elements by a space.
Note: I believe $matches in created when given as a parameter to preg_match.
$regex = '/[((\+)?[a-zA-z]+)(\"(\+)?[a-zA-Z ]+\")]/';
preg_match($regex, $string, $matches);
foreach($matches as $match)
{
if(substr($match, 0, 1) != "+") $match = "+" + $match;
}
$result = implode($matches, " ");

There is a way to use not operator in regex when ^ inside bracket is not an option?

I went through dozen of already answered Q without finding one that can help me.
I have a string like this:
aaa.{foo}-{bar} dftgyh {foo-bar}{bar} .? {.!} -! a}aaa{
and I want to obtain a string like this:
aaa{foo}-{bar}dftgyh{foo-bar}{bar}-aaaa
Essentially I want to keep:
valid word chars and hyphens wrapped in an open and a closed curly bracket, something that will match the regex \{[\w\-]+\}
all the valid word chars and hyphens outside curly brackets
Using this:
$result = preg_replace( array( "#\{[\w\-]+\}#", '#[\w\-]#' ), "", $string );
I obtain the exact contrary of what I want: I remove the part that I want to keep.
Sure I can use ^ inside the square brackets in the second pattern, but it will not work for the first.
I.e. this will not work (the second pattern in the array is valid, the first not):
$result = preg_replace( array( "#[^\{[\w\-]+\}]#", '#[^\w\-]#' ), "", $string );
So, whis the regex that allow me to obtain the wanted result?
You may consider matching what you want instead of replacing the characters you do not want. The following will match word characters and hyphen both inside and outside of curly braces.
$str = 'aaa.{foo}-{bar} dftgyh {foo-bar}{bar} .? {.!} -! a}aaa{';
preg_match_all('/{[\w-]+}|[\w-]+/', $str, $matches);
echo implode('', $matches[0]);
Output as expected:
aaa{foo}-{bar}dftgyh{foo-bar}{bar}-aaaa
Also an option to (*SKIP)(*F) the good stuff and do a preg_replace() with the remaining:
$str = preg_replace('~(?:{[-\w]+}|[-\w]+)(*SKIP)(*F)|.~' , "", $str);
test at regex101; eval.in

remove double square brackets and keep the string

I need to remove all square brackets from a string and keep the string. I've been looking around but all topic OP's want to replace the string with something.
So: [[link_to_page]]
should become: link_to_page
I think I should use php regex, can someone assist me?
Thanks in advance
You can simply use a str_replace.
$string = str_replace(array('[[',']]'),'',$string);
But this would get a '[[' without a ']]' closure. And a ']]' without a '[[' opening.
It's not entirely clear what you want - but...
If you simply want to "remove all square brackets" without worrying about pairing/etc then a simple str_replace will do it:
str_replace( array('[',']') , '' , $string )
That is not (and doesn't need to be) a regex.
If you want to unwrap paired double brackets, with unknown contents, then a regex replace is what you want, which uses preg_replace instead.
Since [ and ] are metacharacters in regex, they need to be escaped with a backslash.
To match all instances of double-bracketed text, you can use the pattern \[\[\w+\[\] and to replace those brackets you can put the contents into a capture group (by surrounding with parentheses) and replace all instances like so:
$output = preg_replace( '/\[\[(\w+)\[\]/' , '$1' , $string );
The \w matches any alphanumeric or underscore - if you want to allow more/less characters it can be updated, e.g. \[\[([a-z\-_]+)\[\] or whatever makes sense.
If you want to act on the contents of the square brackets, see the answer by fluminis.
You can use preg_replace:
$repl = preg_replace('/(\[|\]){2}/', '', '[[link_to_page]]');
OR using str_replace:
$repl = str_replace(array('[[', ']]'), '', '[[link_to_page]]');
If you want only one match :
preg_match('/\[\[([^\]]+)\]\]/', $yourText, $matches);
echo $matches[1]; // will echo link_to_page
Or if you want to extract all the link from a text
preg_match_all('/\[\[([^\]]+)\]\]/', $yourText, $matches);
foreach($matches as $link) {
echo $link[1];
}
How to read '/\[\[([^\]]+)\]\]/'
/ start the regex
\[\[ two [ characters but need to escape them because [ is a meta caracter
([^\]]+) get all chars that are not a ]
\]\] two ] characters but need to escape them because ] is a meta caracter
/ end the regex
Try
preg_replace(/(\[\[)|(\]\])/, '', $string);

Categories