Finding correct php regex for this complex element - php

I'm trying to get a regex which is able to find the following part in a string.
[TABLE|head,border|{
#TEXT|TEXT|TEXT#
TEXT|TEXT|TEXT
TEXT|TEXT|TEXT
TEXT|TEXT|TEXT
}]
Its from a simple self made WYSIWYG Editor, which gives the possibility to add tables. But the "syntax" for a table should be as simple as the one above.
No as there can be many of these table definitions, I need to find all with php's preg_match_all to replace them with the well known <table> tag in html.
The regex iam trying to use for is the following:
/\[TABLE\|(.*)\|\{(.*)\}\]/si
The \x0A stays for a newline as my app is running on Linux this is enough (works fine with simpler regex).
I use the online regex tester on functions-online.com.
The matches it gets are not really usefull. And if i have more than one TABLE definition like the one above, then the matches are completely useless. Because of the (.*) it covers all from starting from "head,border" going to the very last "|" character in the second TABLE definition.
I would like to get a list of matches giving me the complete table command one by one.

This is because by default the .* will be a greedy match, assuming your code works correctly for an input containing only a single value. Placing a question mark after the two .*'s should prevent greedyness being an issue.
/\[TABLE\|(.*?)\|\{(.*?)\}\]/si

Related

changing www*.com to a clickable URL with REGEX

I'm working on a web page and regex keeps coming up as the best way to handle string manipulation for an issue I'm trying to resolve. Unfortunately, regex is not exactly trivial and I've been having trouble. Any help is appreciated;
I would like to make strings entered from a php form into clickable links. I've received help with my first challenge; how to make strings starting with http, https or ftp into clickable links;
function make_links_clickable($message){
return preg_replace('!(((f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $message);
}
$message = make_links_clickable($message);
And this works well. When I look at it (and do some research), the best that I can glean from the syntax is that the first piece is matching ftp, http, and https, :, and // along with a wide range of combined patterns. I would like to know how I can;
1) Make links starting with www, or ending with .com/.net/.org/etc clickable (like google.com, or www.google.com - leaving out the http://)
2) Change youtube links like
"https://www.youtube.com/watch?v=examplevideo"
into
"<iframe width="560" height="315" src="//www.youtube.com/embed/examplevideo" frameborder="0" allowfullscreen></iframe>"
I think these two cases are basically doing the same kind of thing, but figuring out is not intuitive. Any help would be deeply appreciated.
The first regular expression there is made to match almost everything that follows ftp://, http://, https:// that occurs, so it might be best to implement the others as separate expressions since they'll only be matching hostnames.
For number 1, you'll need to decide how strictly you wish to match different TLDs (.com/.net/etc). For example, you can explicitly match them like this:
(www\.)?[a-z0-9\-]+\.(com|net|org)
However, that will only match URLs that end in .com, .net, or .org. If you want all top-level domains and only the valid ones, you'll need to manually write them all in to the end of that. Alternatively, you can do something like this,
(www\.)?[a-z0-9\-]+\.[a-z]{2,6}
which will accept anything that looks like a url and ends with "dot", and any combination of 2 to 6 letters (.museum and .travel). However, this will match strings like "fgs.fds". Depending on your application, you may need to add more characters to [a-z], to add support for extended character alphabets.
Edit (2 Aug 14): As pointed out in the comments below, this won't match TLDs like .co.uk. Here's one that will:
(www\.)?[a-z0-9\-]+\.([a-z]{2,3}(\.?[a-z]{2,3})?)
Instead of any string between two and six characters (following a period), this will match any two to three, then another one to three (if present), with or without a dividing period.
It'd be redundant, but you could instead remove the question mark after www on the second option, then do both tests; that way, you can match any string ending in a common TLD, or a string that begins with "www." and is followed by any characters with one period separating them, "gpspps.cobg". It would still match sites that might not actually exist, but at least it looks like a url, at it would look like one.
For the YouTube one, I went a little question mark crazy.
(?i:(?:(?:http(?:s)?://)?(?:www\.)?)?youtu(?:\.be/|be\.com/watch\?(?:[a-z0-9_\-\%\&\=]){0,}?v\=))([a-zA-Z0-9_\-]{11}){0,}?v\=))(?i)([a-zA-Z0-9_\-]{11})
EDIT: I just tried to use the above regex in one of my own projects, but I encountered some errors with it. I changed it a little and I think this version may be better:
(?i:(?:(?:http(?:s)?://)?(?:www\.)?)?youtu(?:\.be/|be\.com/watch\?(?:[a-z0-9_\-\%\&\=]){0,})?)(?:v=)?([a-zA-Z0-9_\-]{11})
For those not familiar with regular expressions, parentheses , ( ...regex... ), are stored as groups, which can be selectively picked out of matched strings. Parenthesis groups that begin with ?: as in most of the ones up there, (?:www\.) are however not captured within the groups. Because the end of that regex was left as a normal—"captured"—group, ([a-zA-Z0-9_\-]{11}), you use the $matches argument of functions like preg_match, then you can use $matches[1] to get the YouTube ID of the video, 'examplevide', then work with it however you'd like. Also note, the regex is only matching 11 characters for the ID.
This regex will match pretty much any of the current youtube url formats including incorrect cases, and out of (normal) order parameters:
http://youtu.be/dQw4w9WgXcQ
https://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=featured
http://www.youtube.com/watch?feature=featured&v=dQw4w9WgXcQ
http://WWW.YouTube.Com/watch?v=dQw4w9WgXcQ
http://YouTube.Com/watch?v=dQw4w9WgXcQ
www.youtube.com/watch?v=dQw4w9WgXcQ

Basic Regular Expression for

For some reason I always get stuck making anything past extremely basic regular expressions.
I'm trying to make a regular expression that kind of looks like a URL. I only want basic checking.
I would like it to match the following patterns where X is "something".
X://X.X
X://X.X... etc.
X.X
X.X... etc
If the string contains one of these patterns, it is sufficient checking for me. This way a url like www.example.com:8888 will still match. I have tried many different REGEX combinations with preg_match and cannot seem to get any to behave the way I want it to. I have consulted many other related REGEX questions on SO but my readings have not helped me.
Any help? I will be happy to provide more information if you would like but I don't know what else you would need.
It takes practice but here is one that I made using a regex tester (http://www.regextester.com/) to check my pattern:
^.+(:\/\/|\.)([a-zA-Z0-9]+\.)+.+
My approach is to slowly build my pattern from the beginning and add on one piece at a time. This cheatsheet is extremely helpful for remembering http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ what everything is.
Basically the pattern starts at the beginning of the string and checks for any characters followed by either :// or . then checks for groupings of letters and numbers followed by a . ending with any number of characters.
The pattern could probably be improved with groupings to not pass on invalid characters. But this one was quick and dirty. You could replace the first and last . with the characters that would be valid.
UPDATE
Per the comments here is an updated pattern:
^.+?(:\/\/|\.)?([a-zA-Z0-9]+?\.)+.+
/^(.+:\/\/)?[^.]+\.[^.\/]+([.\/][^.\/]+)*$/

PHP preg_match or _replace find substring from string, while retaining the match

So I have this comment-esk like system I am working on, and apparently I didn't think it all the way through, however I am blocked into my method at this point for the time being and foreseeable future. That said.
I am currently using
$pattern = '/\[#(\w+):(.*)\]/';
$subject = '[#keyword:string]';
preg_match($pattern, $subject, $matches);
To find specific info from a particular pattern type. Which is something to the extent of [#keyword:string], and this works great. However in my searches to come up with this method of doing that it failed to dawn on me that the strings I am using this on also contain other stuff, such as a comment. So the over all string would look like '[#keyword:string]Hello World`
The above bit of code will dump an array for me, first item the actual match I was looking for, the next being the keyword and the next the string. Which is great, as I use this for something else. However. Like I said I never compensated for the <br>Hello World part so. What I need to do is something very similar to the above. But only leave the <br>Hello World part as a reusable string as I am intent on rejoining the 2 strings. After my process is done. Except, the process after its done in this case is replacing the pattern I am looking for with a matching User Name, or an Image based on the keyword, which then dispatches a function to find the right name, or create the image tag for the image. Which thats fine I got that part, its just the stripping of the pattern from the string so when I get the name(s) or image back I can rejoin it to the string.
Well hopefully that makes sense.
Use the one of the following patterns:
'/\[#(\w+):(.*?)\]/'
'/\[#(\w+):(.*)\]/U'
Note: PCRE_UNGREEDY

Discard character in matching group

I have a couple of matching groups one after another in a long Regex pattern. Around the middle I have
...(?<number>(?:/(?:digit|num))?\d+|)...
which should match something like /num9, /digit9 or 9 or blank (because I need the named group to appear in the resulting associative array even if it's empty).
The pattern works, but is it possible to discard the / character if the one of first two cases is matched? I tried a positive lookahead, but it seems that you can't use those if you have expressions before the lookahead.
Is what I'm trying to accomplish possible using Regex?
Based on your input, I think that you need to capture / anyway at some point, otherwise your whole regex fails. At the same time you want to ignore it, so it cannot be a part of you named group. Therefore by putting it outside it and making it optional, while ensuring that a digit is not preceded directly by a / you come up with the desired results :
^/?(?<number>(?:(?:digit|num))?(?<!/)\d+|)$
However given your lack of a more complete input and regex, I am not 100% sure this will work for all your cases.

Confused about the behavior of regex in a url routing script

I just finished learning about regex and I thought that I should put it into something useful, so I created a small url routing script with php and the following regex:
^(?:/(\w+)?)*$
(the php code currently doesnt do anything, just prints out the matching groups from preg_match)
currently if given the url /foobar/foo/bar, the matching groups are the entire string (normal behavior) and the last part of the url (in this case: bar).
Obviously, this is a problem.
I think that this is caused because of the use of 1 capture group, which only captures the last matching string, but I'm not sure. any advice on the real cause of this and/or a solution to this will be greatly appreciated.
Thanks in advance!
You have diagnosed the problem correctly - on each repetition of the surrounding group, the previously matched contents of the capturing group are "overwritten" by the new match.
It's not quite clear what you would have expected to happen. I guess that you would have liked each part of the path to be "remembered" as its own group? This is something you can't do with repeated groups in PHP (only a few regex dialects (Perl 6 and .NET) allow something like this).
In your case, you're probably better off by using your regex to validate the URL and then split it along the slashes:
$result = preg_split('%/%', $subject);

Categories