get substring between 2 characters in php

get substring between 2 characters in php - php

Im using a mentioning system like on twitter and instagram where you simply put #johndoe
what im trying to do is be able to strip down to the name in-between "#" and these characters ?,,,],:,(space)
as an example heres my string:
hey #johnDoe check out this event, be sure to bring #janeDoe:,#johnnyappleSeed?, #johnCitizen] , and #fredNerk
how can i get an array of janeDoe,johnnyappleSeed,johnCitizen,fredNerk without the characters ?,,,],: attached to them.
i know i have to use a variation of preg_match but i dont have a strong understanding of it.

This is what you've asked for: /\#(.*?)\s/
This is what you really want: /\b\#(.*?)\b/
Put either one into preg_match_all() and evaluate the results array.

preg_match_all("/\#(.*?)\s/", $string, $result_array);

$check_hash = preg_match_all ("/#[a-zA-Z0-9]*/g", $string_to_match_against, $matches);
You could then do somthing like
foreach ($matches as $images){
echo $images."<br />";
}
UPDATE: Just realized you were looking to remove the invalid characters. Updated script should do it.

How about:
$str = 'hey #johnDoe check out this event, be sure to bring #janeDoe:,#johnnyappleSeed?, #johnCitizen] , and #fredNerk';
preg_match_all('/#(.*?)(?:[?, \]: ]|$)/', $str, $m);
print_r($m);
output:
Array
(
[0] => Array
(
[0] => #johnDoe
[1] => #janeDoe:
[2] => #johnnyappleSeed?
[3] => #johnCitizen]
[4] => #fredNerk
)
[1] => Array
(
[0] => johnDoe
[1] => janeDoe
[2] => johnnyappleSeed
[3] => johnCitizen
[4] => fredNerk
)
)
explanation:
The regular expression:
(?-imsx:#(.*?)(?:[?, \]: ]|$))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[?, \]: ] any character of: '?', ',', ' ', '\]',
':', ' '
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Related

Finding sentences between characters

I am trying to find sentences between pipe | and dot ., e.g.
| This is one. This is two.
The regex pattern I use :
preg_match_all('/(:\s|\|+)(.*?)(\.|!|\?)/s', $file0, $matches);
So far I could not manage to capture both sentences. The regex I use captures only the first sentence.
How can I solve this problem?
EDIT: as it may seen from the regex, I am trying to find the sentences BETWEEN (: or |) AND (. or ! or ?)
Column or pipe indicates starting point for sentences.
The sentences might be:
: Sentence one. Sentence two. Sentence three.
| Sentence one. Sentence two?
| Sentence one. Sentence two! Sentence three?

I would keep it simple and just match on:
\s*[^.|]+\s*
This says to match any content not consisting of pipes or full stops, and it also trims optional whitespace before/after each sentence.
$input = "| This is one. This is two.";
preg_match_all('/\s*[^.|]+\s*/s', $input, $matches);
print_r($matches[0]);
This prints:
Array
(
[0] => This is one
[1] => This is two
)

This does the job:
$str = '| This is one. This is two.';
preg_match_all('/(?:\s|\|)+(.*?)(?=[.!?])/', $str, $m);
print_r($m)
Output:
Array
(
[0] => Array
(
[0] => | This is one
[1] => This is two
)
[1] => Array
(
[0] => This is one
[1] => This is two
)
)
Demo & explanation

Another option is to make use of \G to get iterative matches asserting the position at the end of the previous match and capture the values in a capturing group matching a dot and 0+ horizontal whitespace chars after.
(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*
In parts
(?: Non capturing group
\|\h* Match | and 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match
) Close group
( Capture group 1
- [^.\r\n]+ Match 1+ times any char other than . or a newline
) Close group
\.\h* Match 1 . and 0+ horizontal whitespace chars
Regex demo | Php demo
For example
$re = '/(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*/';
$str = '| This is one. This is two.
John loves Mary.| This is one. This is two.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => | This is one.
[1] => This is one
)
[1] => Array
(
[0] => This is two
[1] => This is tw
)
)

To keep it simple, find everything between | and . and then split:
$input = "John loves Mary. | This is one. This is two. | Sentence 1. Sentence 2.";
preg_match_all('/\|\s*([^|]+)\./', $input, $matches);
if ($matches) {
foreach($matches[1] as $match) {
print_r(preg_split('/\.\s*/', $match));
}
}
Prints:
Array
(
[0] => This is one
[1] => This is two
)
Array
(
[0] => Sentence 1
[1] => Sentence 2
)

split string by spaces and colon but not if inside quotes

having a string like this:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
the desired result is:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
what I get with:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
is:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

I would use PCRE verb (*SKIP)(*F),
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
DEMO

Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
pattern details:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
demo
Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
but it's a little less efficient.

For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
Output:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
Demo:
http://ideone.com/EP06Nt
Regex Explanation:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»

PHP - Regex match curly brackets within other regex expression

I am trying to figure out how to match other parts of the stuff I need but can't seem to get it to work.
This is what I have so far:
preg_match_all("/^(.*?)(?:.\(([\d]+?)[\/I^\(]*?\))(?:.\((.*?)\))?/m",$data,$r, PREG_SET_ORDER);
Example text:
INPUT - Each line represents a line inside a text file.
-------------------------------------------------------------------------------------
"!?Text" (1234) 1234-4321
"#1 Text" (1234) 1234-????
#2 Text (1234) {Some text (#1.1)} 1234
Text (1234) 1234
Some Other Text: More Text here 1234-4321 (1234) (V) 1234
What I want to do:
I want to also match things in curly brackets and stuff in brackets of curly brackets.
I can't seem to get it to work considering that things in curly brackets + brackets may not always be within the line.
Essentially first (1234) will be a year and I only want to match it once, however in the last string example it also matches (V) but I don't want it to.
Desirable output:
Array
(
[0] => "!?Text" (1234)
[1] => "!?Text"
[2] => 1234
)
Array
(
[0] => "#1 Text" (1234)
[1] => "#1 Text"
[2] => 1234
)
Array
(
[0] => "#2 Text" (1234)
[1] => "#2 Text"
[2] => 1234
[3] => Some text (#1.1) // Matches things within curly brackets if there are any.
[4] => Some text // Extracts text before brackets
[5] => #1.1 // Extracts text within brackets (if any because brackets may not be within curly brackets.)
)
Array
(
[0] => Text (1234)
[1] => Text
[2] => 1234
)
Array // (My current regular expression gives me a 4th match with value 'V', which it shouldn't do)
(
[0] => Some Other Text: More Text here 1234-4321 (1234) (V)
[1] => Some Other Text: More Text here 1234-4321
[2] => 1234
)

What about using:
^((.*?) *\((\d+)\))(?: *\{((.*?) *\((.+?)\)) *\})?
DEMO
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \6:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
? ' ' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \6
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
)? end of grouping

Regular expression match, extracting only wanted segments of string

I am trying to extract three segments from a string. As I am not particularly good with regular expressions, I think what I have done could probably be done better.
I would like to extract the bold parts of the following string:
SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE,
New=ANYTHING_HERE)
Some examples could be:
ABC: Some_Field (Old=,New=123)
ABC: Some_Field (Old=ABCde,New=1234)
ABC: Some_Field (Old=Hello World,New=Bye Bye World)
So the above would return the following matches:
$matches[0] = 'Some_Field';
$matches[1] = '';
$matches[2] = '123';
So far I have the following code:
preg_match_all('/^([a-z]*\:(\s?)+)(.+)(\s?)+\(old=(.+)\,(\s?)+new=(.+)\)/i',$string,$matches);
The issue with the above is that it returns a match for each separate segment of the string. I do not know how to ensure the string is the correct format using a regular expression without catching and storing the match if that makes sense?
So, my question, if not already clear, how I can retrieve just the segments that I want from the above string?

You don't need preg_match_all. You can use this preg_match call:
$s = 'SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2)';
if (preg_match('/[^:]*:\s*(\w*)\s*\(Old=(\w*),\s*New=(\w*)/i', $s, $arr))
print_r($arr);
OUTPUT:
Array
(
[0] => SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2
[1] => ANYTHING_HERE
[2] => ANYTHING_HERE1
[3] => ANYTHING_HERE2
)

if(preg_match_all('/([a-z]*)\:\s*.+\(Old=(.+),\s*New=(.+)\)/i',$string,$matches)) {
print_r($matches);
}
Example:
$string = 'ABC: Some_Field (Old=Hello World,New=Bye Bye World)';
Will match:
Array
(
[0] => Array
(
[0] => ABC: Some_Field (Old=Hello World,New=Bye Bye World)
)
[1] => Array
(
[0] => ABC
)
[2] => Array
(
[0] => Hello World
)
[3] => Array
(
[0] => Bye Bye World
)
)

The problem is that you're using more parenthesis than you need, and thus capturing more segments of the input than you wish.
eg, each (\s?)+ segment should just be \s*
The regex that you're looking for is:
[^:]+:\s*(.+)\s*\(old=(.*)\s*,\s*new=(.*)\)
In PHP:
preg_match_all('/[^:]+:\s*(.+)\s*\(old=(.*)\s*,\s*new=(.*)\)/i',$string,$matches);
A useful tool can be found here: http://www.myregextester.com/index.php
This tool offers an "Explain" checkbox (as well as a "PHP" checkbox and "i" flag checkbox which you'll want to select) which provides a full explanation of the regex as well. For posterity, I've included the explanation below as well:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
old= 'old='
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
new= 'new='
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

What about something simpler like ^_^
[:=]\s*([\w\s]*)
Live DEMO

:\s*([^(\s]+)\s*\(Old=([^,]*),New=([^)]*)
Live demo
also please tell if you want explanations.

regular expression end tag = start tag

Take a look at this regular expression:
(?:\(?")(.+)(?:"\)?)
This regex would match e.g
"a"
("a")
but also
"a)
How can I say that the starting character [ in this case " or ) ] is the same as the ending character? There must be a simplier solution than this, right?
"(.+)"|(?:\(")(.+)(?:"\))

I don't think there's a good way to do this specifically with regex, so you are stuck doing something like this:
/(?:
"(.+)"
|
\( (.+) \)
)/x

how about:
(\(?)(")(.+)\2\1
explanation:
(?-imsx:(\(?)(")(.+)\2\1)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\(? '(' (optional (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\2 what was matched by capture \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping

You can use Placeholders in PHP. But note, that this is not normal Regex behaviour, its special to PHP.:
preg_match("/<([^>]+)>(.+)<\/\1>/") (the \1 references the outcome of the first match)
This will use the first match as condition for the closing match. This matches <a>something</a> but not <h2>something</a>
However in your case you would need to turn the "(" matched within the first group into a ")" - which wont work.
Update: replacing ( and ) to <BRACE> AND <END_BRACE>. Then you can match using /<([^>]+)>(.+)<END_\1>/. Do this for all Required elements you use: ()[]{}<> and whatevs.
(a) is as nice as [f] will become <BRACE>a<END_BRACE> is as nice as <BRACKET>f<END_BRACKET> and the regex will capture both, if you use preg_match_all
$returnValue = preg_match_all('/<([^>]+)>(.+)<END_\\1>/', '<BRACE>a<END_BRACE> is as nice as <BRACKET>f<END_BRACKET>', $matches);
leads to
array (
0 =>
array (
0 => '<BRACE>a<END_BRACE>',
1 => '<BRACKET>f<END_BRACKET>',
),
1 =>
array (
0 => 'BRACE',
1 => 'BRACKET',
),
2 =>
array (
0 => 'a',
1 => 'f',
),
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

get substring between 2 characters in php - php

This is what you've asked for: /\#(.?)\s/ This is what you really want: /\b\#(.?)\b/ Put either one into preg_match_all() and evaluate the results array.

preg_match_all("/\#(.*?)\s/", $string, $result_array);

$check_hash = preg_match_all ("/#[a-zA-Z0-9]*/g", $string_to_match_against, $matches); You could then do somthing like foreach ($matches as $images){ echo $images."<br />"; } UPDATE: Just realized you were looking to remove the invalid characters. Updated script should do it.

Related

Finding sentences between characters

split string by spaces and colon but not if inside quotes

PHP - Regex match curly brackets within other regex expression

Regular expression match, extracting only wanted segments of string

regular expression end tag = start tag

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

get substring between 2 characters in php - php

This is what you've asked for: /\#(.*?)\s/ This is what you really want: /\b\#(.*?)\b/ Put either one into preg_match_all() and evaluate the results array.

preg_match_all("/\#(.*?)\s/", $string, $result_array);

$check_hash = preg_match_all ("/#[a-zA-Z0-9]*/g", $string_to_match_against, $matches); You could then do somthing like foreach ($matches as $images){ echo $images."<br />"; } UPDATE: Just realized you were looking to remove the invalid characters. Updated script should do it.

Related

Finding sentences between characters

split string by spaces and colon but not if inside quotes

PHP - Regex match curly brackets within other regex expression

Regular expression match, extracting only wanted segments of string

regular expression end tag = start tag

Categories

Resources

This is what you've asked for: /\#(.?)\s/ This is what you really want: /\b\#(.?)\b/ Put either one into preg_match_all() and evaluate the results array.