I got to extract string like "THE NEED OF FOLLOWING A RELIGION " from string.
I extracted individual words like THE , NEED, OF... but I need complete string of capital letter like "THE NEED OF FOLLOWING A RELIGION" but not able to do so, please help.
preg_match_all("/[A-Z]*/", $html, $out);
Thanks
A very basic modification to the original code to find capitals of more than 1 at a time.
$str='This is a string WITH MIXED CASE words and WE ONLY WANT capitals';
preg_match_all("/[A-Z\s]{2,}/", $str, $out);
echo '<pre>',print_r($out,true),'</pre>';
outputs:
Array
(
[0] => Array
(
[0] => WITH MIXED CASE
[1] => WE ONLY WANT
)
)
Your regex just missed some condition of delimiter, which is
Words has to be either followed or leaded by a space.
Convert the sentence above to regex we get
[A-Z]*(?=\s)|(?<=\s)[A-Z]*
The regex above can interpret into either
\sWORD
WORD\s
\sWORD\s
See DEMO.
You can add the white space to your class like [A-Z ]. Now you can get all the strings in capitalized words but also a bench of single spaces.To avoid getting single spaces you use this ([A-Z]+[A-Z ]*[A-Z]) I added the () to capture the matched results.
You can check it in action here.
Related
I need to extract from a string 2 parts and place them inside an array.
$test = "add_image_1";
I need to make sure that this string starts with "add_image" and ends with "_1" but only store the number part at the very end. I would like to use preg_split as a learning experience as I will need to use it in the future.
I don't know how to use this function to find an exact word (I tried using "\b" and failed) so I've used "\w+" instead:
$result = preg_split("/(?=\w+)_(?=\d)/", $test);
print_r($result);
This works fine except it also accepts a bunch of other invalid formats such as:
"add_image_1_2323". I need to make sure it only accepts this format. The last digit can be larger than 1 digit long.
Result should be:
Array (
[0] => add_image
[1] => 1
)
How can I make this more secure?
Following regex checks for add_image as beginning of string and matches _before digit.
Regex: (?<=add_image)_(?=\d+$)
Explanation:
(?<=add_image) looks behind for add_image
(?=\d+$) looks ahead for number which is end of string and matches the _.
Regex101 Demo
I can't seem to get Regular Expressions right whenever I need to use them ...
Given a string like this one:
$string = 'text here [download="PDC" type="A"] and the text continues [download="PDS" type="B"] and more text yet again, more shotcodes might exist ...';
I need to print the "text here" part, then execute a mysql query based on the variables "PDC" and "A", then print the rest of the string... (repeating all again if more [download] exist in the string).
So far I have the following regex
$regex = '/(.*?)[download="(.*?)" type="(.*?)"](.*?)/';
preg_match($regex,$string,$res);
print_r($res);
But this is only capturing the following:
Array ( [0] => 111111 [1] => 111111 [2] => )
I'm using preg_match() ... should I use preg_match_all() instead? Anyway ... the regex is surely wrong... any help ?
[ opens character class, and ] finishes it. Such characters with meaning need to be either escaped or put into a QE block in PCRE regex.
/(.*?)\Q[download="\E(.*?)" type="(.*?)"](.*?)/
##^ ## ^-- you were looking for "tipo"
|
this character needs to be taken literal, hence the \Q....\E around it
## ##
Try it with with "little" one
/(?P<before>(?:(?!\[download="[^"]*" type="[^"]*"\]).)*)\[download="(?P<download>[^"]*)" type="(?P<type>[^"]*)"\](?P<after>(?:(?!\[download="[^"]*" type="[^"]*"\]).)*)/
It will provide you the keys before, after, download and type in the matches result.
Test it here: http://www.regex101.com/r/mF2vN5
I would like to split a string in PHP containing quoted and unquoted substrings.
Let's say I have the following string:
"this is a string" cat dog "cow"
The splitted array should look like this:
array (
[0] => "this is a string"
[1] => "cat"
[2] => "dog"
[3] => "cow"
)
I'm struggling a bit with regex and I'm wondering if it is even possible to achieve with just one regex/preg_split-Call...
The first thing I tried was:
[[:blank:]]*(?=(?:[^"]*"[^"]*")*[^"]*$)[[:blank:]]*
But this splits only array[0] and array[3] correctly - the rest is splitted on a per character base.
Then I found this link:
PHP preg_split with two delimiters unless a delimiter is within quotes
(?=(?:[^"]*"[^"]*")*[^"]*$)
This seems to me as a good startingpoint. However the result in my example is the same as with the first regex.
I tried combining both - first the one for quoted strings and then a second sub-regex which should ommit quoted string (therefore the [^"]):
(?=(?:[^"]*"[^"]*")*[^"]*$)|[[:blank:]]*([^"].*[^"])[[:blank:]]*
Therefore 2 questions:
Is it even possible to achieve what I want with just one regex/preg_split-Call?
If yes, I would appreciate a hint on how to assemble the regex correctly
Since matches cannot overlap, you could use preg_match_all like this:
preg_match_all('/"[^"]*"|\S+/', $input, $matches);
Now $matches[0] should contain what you are looking for. The regex will first try to match a quoted string, and then stop. If that doesn't do it it will just collect as many non-whitespace characters as possible. Since alternations are tried from left to right, the quoted version takes precedence.
EDIT: This will not get rid of the quotes though. To do this, you could use capturing groups:
preg_match_all('/(?|"([^"]*)"|(\S+))/', $input, $matches);
Now $matches[1] will contain exactly what you are looking for. The (?| is there so that both capturing groups end up at the same index.
EDIT 2: Since you were asking for a preg_split solution, that is also possible. We can use a lookahead, that asserts that the space is followed by an even number of quotes (up until the end of the string):
$result = preg_split('/\s+(?=(?:[^"]*"[^"]*")*$)/', $input);
Of course, this will not get rid of the quotes, but that can easily be done in a separate step.
this is my custom tag
[extract=A:B(
<div>
<p>Some content...</p>
</div>
)]
The word extract stays as it is.
Value A has a string input (one word no spaces, no line breaks)
Value B will contain html closed in (). it will contain line breaks
I am not good with regular expressions but this is basically what I want.
\/[extract=(.*?):(.*?)/]\
I need the appropriate pattern query and a foreach loop, preg_match_all(), to return A & B
Try this:
preg_match_all('/\[extract=(?<class>\w+):(?<method>\w+)\((?<html>.*?)\)\]/s', $content, $matches);
print_r($matches['class']);
print_r($matches['method']);
print_r($matches['html']);
Should output:
Array
(
[0] => A
)
Array
(
[0] => B
)
Array
(
[0] =>
<div>
<p>Some content...</p>
</div>
)
This may not be perfect, but seems to work in very quick and limited testing. If nothing else, it might help you get to a better solution.
/^(?:\[extract=)(\w)+:(\w+\(.*\))\]$/s
Note that the trailing s flag is used to make the dot match all characters including new lines.
I don't fully understand your question. Where should the string input for "A" be? In the place of letter "A"?
If this is what you want, then the solution is:
preg_match_all('/\[extract=([^\s]+?):(.+)\]/s', 'your custom tag', $result);
So what you might be looking for is the modifier s which modifies the dot (".") character's meaning so to include linebreaks.
I also recommend you http://www.regular-expressions.info/ if you want to get more familiar with regular expressions.
I want to split a string into two parts, the string is almost free text,
for example:
$string = 'hi how are you';
and I want the split to look like this:
array(
[0] => hi
[1] => how are you
)
I tried using this regex: /(\S*)\s*(\.*)/ but even when the array returned is the correct size, the values comes empty.
What should be the pattern necessary to make this works?
What are the requirements? Your example seems pretty arbitrary. If all you want is to split on the first space and leave the rest of the string alone, this would do it, using explode:
$pieces = explode(' ', 'hi how are you', 2);
Which basically says "split on spaces and limit the resulting array to 2 elements"
You should not be escaping the "." in the last group. You're trying to match any character, not a literal period.
Corrected: /(\S*)\s*(.*)/