I am curling an page and getting the output
however what is happening is that the html encoding is being removed so new lines are being skipped,
so it looks like this
This is Bob. He lives in an boatBut he only has one oar to row with.
in order to detect new lines I figure it was easier to just check for strings that only have One upper case letter and spaces inbetween, so far I have this
(\s\w+\s\w+.\s\D+[a-z][A-Z])
However this does not seem to work
as it only matches this
is Bob. He lives in an boatB
see here http://regex101.com/r/gH0lW1
how to match all strings that have spaces and match all strings up to one Uppercase letter
Update: this will split on the condition without losing any characters
<?php
$string = "This is Bob. He lives in an boatBut he only has one oar to row with.He also does stuff, it is cool.";
$array = preg_split('/(?<=[a-z.])(?=[A-Z])/', $string);
print_r($array);
?>
Use a positive lookbehind to ensure you capture a capital after a lowercase:
(?<=[a-z])[A-Z]
http://regex101.com/r/cB7bD8
You could use php's preg_split if you want, to explode the result on this regex.
(.*?(?:\w+(?=[A-Z]))|\1)
This regex has a recursive part that will match more than 1 sentence in a whole text. So you can check the Live demo and see the matched groups.
But,
If you wanna include a newline on each sentence begins after a period (.) as well, then I modify above regex to this:
(.*?(?:(?:\w+|\. *)(?=[A-Z]))|\1)
and now you can compare results with the first regex HERE
Related
I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...
I want to find all strings looking like [!plugin=tesplugin arg=dfd arg=2!] and put them in array.
Important feature: the string could contain arg=uments or NOT(in some cases). and of course there could be any number of arg's. So the string could look like:
[!plugin=myname!] or [!plugin=whatever1 arg=22!] or even [!plugin=gal-one arg=1 arg=text arg=tx99!]. I need to put them all in $strarray items
Here is what i did...
$inp = "[!plugin=tesplugin arg=dfd!] sometxt [!plugin=second arg=1 arg=2!] 1sd";
preg_match_all('/\[!plugin=[a-z0-9 -_=]*!]/i', $inp, $str);
but $str[0][0] contains:
[!plugin=tesplugin arg=dfd!] sometxt [!plugin=second arg=1 arg=2!]
instead of putting each expression in a new array item..
I think my problem in regex.. but can't find one. Plz help...
The last ] needs to be escaped and the - in the character class needs to be at the start, end, or escaped. As is it is a range of ascii characters between a space and underscore.
\[!plugin=[a-z0-9 \-_=]*!\]
Regex101 Demo: https://regex101.com/r/zV4bO2/1
I would like to get a string made of one word with a delimiter word before and after it
i tried but doen t work
$stringData2 = file_get_contents('testtext3.txt');
$regular2=('/(?<=first del)*MAIN WORD(?=last del)*\s');
preg_match_all($regular2,
$stringData2,
$out, PREG_PATTERN_ORDER);
thank you very much for any help
No quantifier needed, add delimeter at end, put \s inside lookahead.
'/(?<=first del)MAIN WORD(?=last del\s)/'
This regex
(?<=xx)[^\s]*(?=yy)
matches hello in:
xxhelloyy
but fails to match in:
xxhello worldyy
This is probably what you're looking for.
If you want the delimiter string included in the match, then you should not be using lookahead or look or look behind. It should be something rather basic, like this.
/\s?first del MAIN WORD last del\s?/
If you do want to return JUST the MAIN WORD part of the match, then this will work.
/(?<=\s?first del)MAIN WORD(?=last del\s?)/
Put a 'i' at the very end of that to make it case insensitive, if you want. I only mention this, because in the example you gave me above has different case between the example text and the desired response.
I am trying to match a string using two different patterns to work together.
My source string is something like this:
Text, white-spaces, new lines and more text then ^^^^<customtag>
I need to get a group (the second one) that would capture one caret or none then a formatted HTML-like tag. So the first group would capture anything else.
It means that the string above should output this:
(Group 1)Text, white-spaces, new lines and more text then ^^^
(Group 2)^<customtag>
In the source string carets may be one, none or up to two thousands.
I need a good pattern that matches all those carets except the last one.
The code below is what I tried.
preg_match_all('/([\s\S]*\^*)(\^?<\w+>)$/', $string, $matches);
Please note: I used [\s\S] instead of the dot to match any character as well as white-spaces and new lines too.
You may follow the below regex:
(?s)(.*)((\^|(?<!\^))<[^>]+>)
Live demo
PHP code:
preg_match_all('/(?s)(.*)((\^|(?<!\^))<[^>]+>)/', $string, $matches);
You can use as this:
preg_match_all('/(.*)((\^<[^>]*>)|([^\^]<[^>]*>))$/', $string, $matches);
See it working here: http://regexr.com?383g9
In this other link it is working fine: http://regex101.com/r/eQ3vV7
i'm new to regular expressions and would like to match the first and last occurrences of a term in php. for instance in this line:
"charlie, mary,bob,bob,mary, charlie, charlie, mary,bob,bob,mary,charlie"
i would like to just access the first and last "charlie", but not the two in the middle. how would i just match on the first and last occurrence of a term?
thanks
If you know what substring you're looking for (ie. it's not a regex pattern), and you're just looking for the positions of your substrings, you could just simply use these:
strpos — Find position of first occurrence of a string
strrpos — Find position of last occurrence of a char in a string
Try this regular expression:
^(\w+),.*\1
The greedy * quantifier will take care that the string between the first word (\w+) and another occurrence of that word (\1, match of the first grouping) is as large as possible.
You need to add ^ and $ symbols to your regular expression.
^ - matches start of the string
$ - matches end of the string
In your case it will be ^charlie to match first sample and charlie$ to match last sample. Or if you want to match both then it will be ^charlie|charlie$.
See also Start of String and End of String Anchors for more details about these symbols.
Try exploding the string.
$names = "charlie, mary,bob,bob,mary, charlie, charlie, mary,bob,bob,mary,charlie";
$names_array = explode(",", $names);
After doing this, you've got an array with the names. You want the last, so it will be at position 0.
$first = $names_array[0];
It gets a little trickier with the last. You have to know how many names you have [count()] and then, since the array starts counting from 0, you'll have to substract one.
$last = $names_array[count($names_array)-1];
I know it may not be the best answer possible, nor the most effective, but I think it's how you really start getting programming, by solving smaller problems.
Good luck.