I am looking for a pattern that matches everything until the first occurrence of a specific character, say a ";" - a semicolon.
I wrote this:
/^(.*);/
But it actually matches everything (including the semicolon) until the last occurrence of a semicolon.
You need
/^[^;]*/
The [^;] is a character class, it matches everything but a semicolon.
^ (start of line anchor) is added to the beginning of the regex so only the first match on each line is captured. This may or may not be required, depending on whether possible subsequent matches are desired.
To cite the perlre manpage:
You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.
This should work in most regex dialects.
Would;
/^(.*?);/
work?
The ? is a lazy operator, so the regex grabs as little as possible before matching the ;.
/^[^;]*/
The [^;] says match anything except a semicolon. The square brackets are a set matching operator, it's essentially, match any character in this set of characters, the ^ at the start makes it an inverse match, so match anything not in this set.
None of the proposed answers did work for me. (e.g. in notepad++)
But
^.*?(?=\;)
did.
Try /[^;]*/
Google regex character classes for details.
sample text:
"this is a test sentence; to prove this regex; that is g;iven below"
If for example we have the sample text above, the regex /(.*?\;)/ will give you everything until the first occurence of semicolon (;), including the semicolon: "this is a test sentence;"
Try /[^;]*/
That's a negating character class.
This was very helpful for me as I was trying to figure out how to match all the characters in an xml tag including attributes. I was running into the "matches everything to the end" problem with:
/<simpleChoice.*>/
but was able to resolve the issue with:
/<simpleChoice[^>]*>/
after reading this post. Thanks all.
this is not a regex solution, but something simple enough for your problem description. Just split your string and get the first item from your array.
$str = "match everything until first ; blah ; blah end ";
$s = explode(";",$str,2);
print $s[0];
output
$ php test.php
match everything until first
This will match up to the first occurrence only in each string and will ignore subsequent occurrences.
/^([^;]*);*/
"/^([^\/]*)\/$/" worked for me, to get only top "folders" from an array like:
a/ <- this
a/b/
c/ <- this
c/d/
/d/e/
f/ <- this
Really kinda sad that no one has given you the correct answer....
In regex, ? makes it non greedy. By default regex will match as much as it can (greedy)
Simply add a ? and it will be non-greedy and match as little as possible!
Good luck, hope that helps.
This works for getting the content from the beginning of a line till the first word,
/^.*?([^\s]+)/gm
I faced a similar problem including all the characters until the first comma after the word entity_id. The solution that worked was this in Bigquery:
SELECT regexp_extract(line_items,r'entity_id*[^,]*')
Related
I would like to get a string made of one word with a delimiter word before and after it
i tried but doen t work
$stringData2 = file_get_contents('testtext3.txt');
$regular2=('/(?<=first del)*MAIN WORD(?=last del)*\s');
preg_match_all($regular2,
$stringData2,
$out, PREG_PATTERN_ORDER);
thank you very much for any help
No quantifier needed, add delimeter at end, put \s inside lookahead.
'/(?<=first del)MAIN WORD(?=last del\s)/'
This regex
(?<=xx)[^\s]*(?=yy)
matches hello in:
xxhelloyy
but fails to match in:
xxhello worldyy
This is probably what you're looking for.
If you want the delimiter string included in the match, then you should not be using lookahead or look or look behind. It should be something rather basic, like this.
/\s?first del MAIN WORD last del\s?/
If you do want to return JUST the MAIN WORD part of the match, then this will work.
/(?<=\s?first del)MAIN WORD(?=last del\s?)/
Put a 'i' at the very end of that to make it case insensitive, if you want. I only mention this, because in the example you gave me above has different case between the example text and the desired response.
This is the text sample:
$text = "asd dasjfd fdsfsd http://11111.com/asdasd/?s=423%423%2F gfsdf http://22222.com/asdasd/?s=423%423%2F
asdfggasd http://3333333.com/asdasd/?s=423%423%2F";
This is my regex pattern:
preg_match_all( "#http:\/\/(.*?)[\s|\n]#is", $text, $m );
That match the first two urls, but how do I match the last one? I tried adding [\s|\n|$] but that will also only match the first two urls.
Don't try to match \n (there's no line break after all!) and instead use $ (which will match to the end of the string).
Edit:
I'd love to hear why my initial idea doesn't work, so in case you know it, let me know. I'd guess because [] tries to match one character, while end of line isn't one? :)
This one will work:
preg_match_all('#http://(\S+)#is', $text, $m);
Note that you don't have to escape the / due to them not being the delimiting character, but you'd have to escape the \ as you're using double quotes (so the string is parsed). Instead I used single quotes for this.
I'm not familar with PHP, so I don't have the exact syntax, but maybe this will give you something to try. the [] means a character class so |$ will literally look for a $. I think what you'll need is another look ahead so something like this:
#http:\/\/(.*)(?=(\s|$))
I apologize if this is way off, but maybe it will give you another angle to try.
See What is the best regular expression to check if a string is a valid URL?
It has some very long regular expressions that will match all urls.
I'm parsing an external feed which contains location and date inside post title which I want to get rid of, so:
This happened on Date in Location
I need to find on (space on space) and remove everything till the end of the line, same for in(space in space).
I googled a bit, but regex is really unfathomable for me so I'd appreciate any help.
Thanks!
Well, a literal "on" does match exactly. Then tell the regex engine to match everything after: ".*". (Note, that the . doesn't match newlines, so it works as needed.)
In the case of "in" you need an alternative, which is marked by parentheses () and the vertical bar |: "(on|in)". You could also make that a bit tighter with character classes []: "[oi]n".
With that we arrive at this regex:
/ [oi]n .*/
To the end of the line? Then I suppose:
preg_replace("/(?:on|in).*?(\n|$)/", "", 'This happened on Date in Location');
Would do it.
Use a negative lookbehind if you want to remove everything after the on and in but not the on and in themselves.
(?<=\son\s).*
and
(?<=\sin\s).*
http://regexr.com?30ops
I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.
Ok, here again.
I'll promise to study deeply the regular expression soon :P
Language: PhP
Problem:
Match if some badword exist inside a string and do something.
The word must be not included inside a "greater word". I mean if i'll search for "rob" (sorry Rob, i'm not thinking you're a badword), the word "problem have to pass without check.
I'd googled around but found nothing good for me. So, I thought something like this:
If i match the word with after and before any character of the following:
.
,
;
:
!
?
(
)
+
-
[whitespace]
I can simulate a check against single word inside a string.
Finally the Questions:
There's a better way to do it?
If not, which will be the correct regexp to consider [all_that_char]word[all_that_char]?
Thanks in advance to anyone would help!
Maybe this is a very stupid question but today is one of that day when move our neurons causes an incredible headache :|
Look up \b (word boundary):
Matches at the position between a word
character (anything matched by \w) and
a non-word character (anything matched
by [^\w] or \W) as well as at the
start and/or end of the string if the
first and/or last characters in the
string are word characters.
(http://www.regular-expressions.info/reference.html)
So: \brob\b matches rob, but not problem.
You can use \b, see Whole word bounderies.