Hi I'm currently working with a project in which the following occurs;
$example = $array[key]
instead of
$example = $array['key'] or $example=$array["key"]
I'm trying to use regex to update these old array key strings. I currently have the following;
(\$([a-z0-9_]*)\[(?!('|"))([a-z0-9_]*)(?!('|"))\])
This matches $array[key], but also matches things like this;
$array[]
$array inside a javascript tag.
The code is also very old and has script tags inside php files, without a framework.
I'm using regex inside Notepad++, does anyone think they could write me a regex query to capture non string array keys and avoid $array[], $array[$variable] and $array inside script tags, and replace them with quotes?
Thank you
You can use
Find What: (?s)<script\b[^>]*>.*?</script>(*SKIP)(*F)|\$(\w+)\[(?!\d+])(\w+)]
Replace With: $$1["$2"]
See the regex demo. See the screenshot after the replacement made on the first line (the examples on the fourth line already had quotes):
Details:
(?s) - now, . matches newnlines
<script\b[^>]*> - an open script tag
.*? - any zero or more chars as few as possible
</script> - closing </script> tag
(*SKIP)(*F) - fail the match and go on to search for the next one from the failure position
| - or
\$ - a $ char
(\w+) - Group 1: any one or more word chars
\[ - a [ char
(?!\d+]) - a negative lookahead that fails the match if there are one or more digits and ] immediately to the right of the current location
(\w+) - Group 2: one or more word chars
] - a ] char.
This seem to work fine for me:
\$[a-zA-Z0-9_]+\[[a-zA-Z0-9_]+\]
I have never worked with regular expressions before and I need them now and I am having some issues getting the expected outcome.
Consider this for example:
[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text
Using the php preg_replace() function, I want to replace [x:3xerpz1z] with <start> and [/x:3xerpz1z] with </end> but I can't figure this out. I have read some regular expression tutorials but I am still confused.
I have tried this for the starting tag:
preg_replace('/(.*)\[x:/','<start>', $source_string);
The above would return:<start>3xerpz1z
As you can see, the "3xerpz1z" isn't getting removed and it needs to be stripped out. I can't hard code and search and replace "3xerpz1z" because the "3xerpz1z" chars are randomly generated and the characters are always different but the length of the tag is the same.
This is the desired output I want:
<start>Some Text</end> Some More Text
I haven't event tried processing [/x:3xerpz1z] because I can't even get the first tag going.
You must use capturing groups (....):
$data = '[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text';
$result = preg_replace('~\[x:([^]]+)](.*?)\[/x:\1]~s', '<start>$2</end>', $data);
pattern details:
~ # pattern delimiter: better than / here (no need to escape slashes)
\[x:
([^]]+) # capture group 1: all that is not a ]
]
(.*?) # capture group 2: content
\[/x:\1] # \1 is a backreference to the first capturing group
~s # s allows the dot to match newlines
I'm not very good at regular expressions at all.
I've been using a lot of framework code to date, but I'm unable to find one that is able to match a URL like http://www.example.com/etcetc, but it is also is able to catch something like www.example.com/etcetc and example.com/etcetc.
For matching all kinds of URLs, the following code should work:
<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>
Then, the correct way to check against the regex is as follows:
<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);
if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>
Courtesy: Comments made by splattermania in the PHP manual: preg_match
RegEx Demo in regex101
This worked for me in all cases I had tested:
$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/';
Tests:
http://test.test-75.1474.stackoverflow.com/
https://www.stackoverflow.com
https://www.stackoverflow.com/
http://wwww.stackoverflow.com/
http://wwww.stackoverflow.com
http://test.test-75.1474.stackoverflow.com/
http://www.stackoverflow.com
http://www.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass#example.com/etcetc
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/
Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.
Try this:
/^http:\/\/|(www\.)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/
It works exactly like the people want.
It takes with or with out http://, https://, and www.
You can use a question mark after a regular expression to make it conditional so you would want to use:
http:\/\/(www\.)?
That will match anything that has either http://www. or http:// (with no www.)
You could just use a replace method to remove the above, thus getting you the domain. It depends on what you need the domain for.
Try something like this:
.*([\w-]+\.)+[a-z]{2,5}(/[\w-]+)*
Use:
/(https?://)?((?:(\w+-)*\w+)\.)+(?:[a-z]{2})(\/?\w?-?=?_?\??&?)+[\.]?([a-z0-9\?=&_\-%#])?/g
It matches something.com, http(s):// or www. It does not match other [something]:// URLs though, but for my purpose that's not necessary.
The regex matches e.g.:
http://foo.co.uk/
www.regex.com/foo.html?q=bar$some=thi-ng,regex
regex.foo.com/blog
You can try this:
r"(http[s]:\/\/)?([\w-]+\.)+([a-z]{2,5})(\/+\w+)? "
Selection:
may be start with http:// or https:// (optional)
anything (word) end with dot (.)
followed by 2 to 5 character [a-z]
followed by "/[anything]" (optional)
followed by space
Try this
$url_reg = /(ftp|https?):\/\/(\w+:?\w*#)?(\S+)(:[0-9]+)?(\/([\w#!:.?+=&%#!\/-])?)?/;
I have been using the following, which works for all my test cases, as well as fixes any issues where it would trigger at the end of a sentence preceded by a full-stop (end.), or where there were single character initials, such as 'C.C. Plumbing'.
The following regex contains multiple {2,}s, which means two or more matches of the previous pattern.
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]{2,}\.([a-zA-Z0-9\&\.\/\?\:#\-_=#]){2,}
Matches URLs such as, but not limited to:
https://example.com
http://example.com
example.com
example.com/test
example.com?value=test
Does not match non-URLs such as, but not limited to:
C.C Plumber
A full-stop at the end of a sentence.
Single characters such as a.b or x.y
Please note: Due to the above, this will not match any single character URLs, such as: a.co, but it will match if it is preceded by a URL scheme, such as: http://a.co.
I was getting so many issues getting the answer from anubhava to work due to recent PHP allowing $ in strings and the preg match wasn't working.
Here is what I used:
// Regular expression
$re = '/((https?|ftp):\/\/)?([a-z0-9+!*(),;?&=.-]+(:[a-z0-9+!*(),;?&=.-]+)?#)?([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))(:[0-9]{2,5})?(\/([a-z0-9+%-]\.?)+)*\/?(\?[a-z+&$_.-][a-z0-9;:#&%=+\/.-]*)?(#[a-z_.-][a-z0-9+$%_.-]*)?/i';
// Match all
preg_match_all($re, $blob, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
// The first element of the array is the full match
This PHP Composer package URL highlight is doing a good job in PHP:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
$matches = $urlHighlight->getUrls($string);
?>
If it does not have to be regex, you could always use the validate filters that are in PHP.
filter_var('http://example.com', FILTER_VALIDATE_URL);
filter_var (mixed $variable [, int $filter = FILTER_DEFAULT [, mixed $options ]]);
Types of Filters
Validate Filters
Regex if you want to ensure a URL starts with HTTP/HTTPS:
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
If you do not require the HTTP protocol:
[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
I got the following URL
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
and I want to extract
B000NO9GT4
that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/
become first part
and
B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
becomes the 2nd part , from the second part , I should extract B000NO9GT4
in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part
I am very bad at regex and cant find a way out..
can somebody guide me how I can do it in php?
thanks
This grabs both pieces of information that you are looking to capture:
$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';
$path = parse_url($url, PHP_URL_PATH);
if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
echo "Description = {$matches[1]}<br />"
."ASIN = {$matches[2]}<br />";
}
Output:
Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4
Short Explanation:
Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
# is used as the delimiter for the expression
^ following the delimiter means match from the beginning of the string.
See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.
You can try
$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);
Output
string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)
I've been using the following site to test a PHP regex so I don't have to constantly upload:
http://www.spaweditor.com/scripts/regex/index.php
I'm using the following regex:
/(.*?)\.{3}/
on the following string (replacing with nothing):
Non-important data...important data...more important data
and preg_replace is returning:
more important data
yet I expect it to return:
important data...more important data
I thought the ? is the non-greedy modifier. What's going on here?
Your non-greedy modifier is working as expected. But preg_match replaces all occurences of the the (non-greedy) match with the replacement text ("" in your case). If you want only the first one replaced, you could pass 1 as the optional 4th argument (limit) to preg_replace function (PHP docs for preg_replace). On the website you linked, this can be accomplished by typing 1 into the text input between the word "Flags" and the word "limit".
just an actual example of #Asaph solution. In this example ou don't need non-greediness because you can specify a count.
replace just the first occurrence of # in a line with a marker
$line=preg_replace('/#/','zzzzxxxzzz',$line,1);