Hi Everybody,
I'm Currently using preg_match and I'm trying to extract some informations enclosed in square brackets.
So far, I have used this:
/\[(.*)\]/
But I want it to be only the content of the last occurence - or the first one, if starting from the end!
In the following:
string = "Some text here [value_a] some more text [value_b]"
I need to get:
"value_b"
Can anybody suggest something that will do the trick?
Thanks!
Match against:
/.*\[([^]]+)\]/
using preg_match (no need for the _all version here, since you only want the last group) and capture the group inside.
Your current regex, with your input, would capture value_a] some more text [value_b. Here, the first .* swallows everything, but must backtrack for a [ to be matched -- the last one in the input.
If you are only expecting numbers/letter (no symbols) you could use \[([\w\d]+)\] with preg_match_all() and pull the last of the array as the end variable. You can add any custom symbols by escaping them in the character class definition.
\[([^\]]*)\][^\[]*$
See it here on regexr
var someText="Some text here [value_a] some more text [value_b]";
alert(someText.match(/\[([^\]]*)\][^\[]*$/)[1]);
The part inside the brackets is stored in capture group 1, therefor you need to use match()1 to access the result.
For simple brakets, see the source to make this answer: Regex for getting text between the last brackets ()
Related
Hi I'm currently working with a project in which the following occurs;
$example = $array[key]
instead of
$example = $array['key'] or $example=$array["key"]
I'm trying to use regex to update these old array key strings. I currently have the following;
(\$([a-z0-9_]*)\[(?!('|"))([a-z0-9_]*)(?!('|"))\])
This matches $array[key], but also matches things like this;
$array[]
$array inside a javascript tag.
The code is also very old and has script tags inside php files, without a framework.
I'm using regex inside Notepad++, does anyone think they could write me a regex query to capture non string array keys and avoid $array[], $array[$variable] and $array inside script tags, and replace them with quotes?
Thank you
You can use
Find What: (?s)<script\b[^>]*>.*?</script>(*SKIP)(*F)|\$(\w+)\[(?!\d+])(\w+)]
Replace With: $$1["$2"]
See the regex demo. See the screenshot after the replacement made on the first line (the examples on the fourth line already had quotes):
Details:
(?s) - now, . matches newnlines
<script\b[^>]*> - an open script tag
.*? - any zero or more chars as few as possible
</script> - closing </script> tag
(*SKIP)(*F) - fail the match and go on to search for the next one from the failure position
| - or
\$ - a $ char
(\w+) - Group 1: any one or more word chars
\[ - a [ char
(?!\d+]) - a negative lookahead that fails the match if there are one or more digits and ] immediately to the right of the current location
(\w+) - Group 2: one or more word chars
] - a ] char.
This seem to work fine for me:
\$[a-zA-Z0-9_]+\[[a-zA-Z0-9_]+\]
I've read the Best RegEx Trick Ever and tried to wrap my head around the other answers here on Stack Exchange and just can't seem to get it right. Take these three strings:
http://www.test.com/newyork/class-schedule
http://www.test.com/location/newyork/class-schedule
http://www.test.com/location/newyork/training
I need a regex that will extract the newyork from the first string and save it for a replace later, but will NOT match any part of the other strings. Also, for obscure reasons, I can not include http://www.test.com as a condition for matching (so I can't use anything before the slash that precedes newyork). Note that in this scenario, newyork could easily be chicago, atlanta, or any other city name with no spaces or punctuation.
The only thing I've been able to figure out that isolates only newyork in the first string is the following:
/.*\.com\/(.[^\/]*)\/class-schedule/g
However, this relies on using the URL first which I can't use.
Any ideas on how to achieve this WITHOUT using the URL?
[EDIT]
To clarify what I'm looking for, I'm trying to take the results from the first string and add "location" to it, still using regex. So:
http://www.test.com/newyork/class-schedule
would become
http://www.test.com/location/newyork/class-schedule
using something like
http://www.test.com/location/$1/class-schedule
Try this: ~/(\w+)/[-a-z]+?/?(?:\?.*?)*(:?\s|$)~gm
See it working here: https://regex101.com/r/4VMazZ/3.
So it will use the end of URL instead of the beginning and match only the word between slash 2 and 3 from the end. There can be a query string it will still work.
[EDIT 1]
I exchanged 2 chars doing typo in the end so it was capturing one extra group: /(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$). here: https://regex101.com/r/4VMazZ/4
If you use preg_match($pattern, $string, $matches); the result you want (newyork) will be in $matches[1];, $matches[0] contains everything.
You can see the captures in 'MATCH INFORMATION' panel on regex101 in my example!
[EDIT 2] after your comment.
If you want to replace the whole url you have to match the whole URL, something like this: .*?/(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$) will do in this example. See it working here: https://regex101.com/r/4VMazZ/5
[EDIT 3] Add capturing of last part for replacement.
So as you want to reuse last part you need to add capturing parenthesis: .*?/(\w+)/([-a-z]+?)/?(?:\?.*?)*(?:\s|$).
See it working here: https://regex101.com/r/4VMazZ/6
Could this work? See it here.
(?<=location\/|\.\w{3}\/|\.\w{2}\/)(?!location).*?(?=\/|$)
It matches everything following .xxx/ or .xx/ or location/. I don't know if one letter domain exist, in this case, you can add |\.\w\/ to the lookahead at the start of the regex.
(?<=location\/|\.\w{3}\/|\.\w{2}\/) is a lookahead, so it matches the following pattern only if preceded by location/ or .xxx or .xx
.*? matches every character (lazy)
(?=\/|$) end match if next character is / or on line end
Note: If location is counted as part of the url, I don't think what you are asking is possible in regex, as the city name could be anywhere in string. If so, then you could have a list of cities and check what part of the url matches one of them.
EDIT: You need the multiline m flag so $ also matches end of line
I'm using the following to look for instances of an ID such as X.123:
$regex_id = "/\b[Xx][\.][0-9]{1,4}\b/";
preg_match_all($regex_id, $html, $matches_id, PREG_SET_ORDER);
The matched IDs are converted to some stored text. This part works well, however I need to add some functionality. Now some ID's will be enclosed in double brackets, such as [[X.123]], and I need to match either the standalone ID, or the bracketed ID.
The standalone ID's will be replaced with some text (ex: X.123 >> MyText).
The bracketed ID's will be replaced with an image (ex: [[X.123]] >> <img src='mypic.png'>.
I need to be careful how this is done so I don't replace [[X.123]] with [[MyText]]. As Jason McCreary indicated below, I can simply order the two expressions though that's probably not the best way.
Is this the correct expression to match the bracketed ID?
\[\[[Xx][\.][\s][0-9]{1,4}\]\]
A naive way would be to do two passes.
Replace [[X.123]]
Replace X.123
I would do so with a single call to preg_replace() using arrays for the search/replace parameters.
UPDATE
A regular expression for [[X.###]] would be:
\[\[[Xx]\.\d{1,4}\]\]
(\[\[)?[Xx]\.[0-9]{1,4}(\]\])?
Is this the correct expression to match the bracketed ID?
\[\[[Xx][\.][\s][0-9]{1,4}\]\]
Unnecessary characters in there.
\[\[[Xx]\.[0-9]{1,4}\]\]
EDIT
...that will match the bracketed-only version. If you need match both:
(?:\[\[)?[Xx]\.[0-9]{1,4}(?:\]\])?
...which won't create back-references to the brackets if/when they do match. The one possible issue here is that you match brackets on one side or the other but not both. LMK if you need it to be more stringent than that.
Cheers
I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.
From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.
Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.
I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.