regex help... php check entry format - php

Im using php to develop an application, but I am running into some issues with regex...
I found a few sites that explain it, but it is for some reason over my head? can someone please help explain regex arguements?
I uploaded a sample of what I am working on here...
First, click on the "+" button at top right to get to the add content view.
Basically, I need it so when you submit from this form, php will check that the values are formatted correctly.
Domain: this can be .com, .co, .biz, .info, etc... User can enter the prefix, like a url, and php gets rid of it... so the ending strings in the array are just domain.com
domain1.com
somedomain.biz
mydomain.co
Redirect: with this one, php uses the ',' so we are left with the ip, and the domainkey as seperate strings, the ip can be 2-3 numbers per section!, so ###.##.##.###, or even ##.##.##.##, and the domain key is a varchar(not so important)
##.##.##.##, domainkey
###.###.###.###, domainkey
Solution for redirect:
(\d{1,3}\.){3}\d{1,3}
/24's: this is similar to the redirect IP, but the end will always end in '0/24'
##.##.###.0/24
##.##.###.0/24
Names:* This one should be the easiest, it can only be letters, no numbers... any length... *
randomname
thisisaname

May I suggest using some software or even website that allows you to test your regex. Such as:
The Regex Coach
Regexpal
RegExr
Expresso
RegexDesigner
etc
It really depends on how strict you want to get with it and how fancy you want to make your regex.
/((\d{1,3}).){3}(\d{1,3})(\/\d{2})?/

Related

How to compare strings based on caracters similarity in SQL?

I'm working on redirecting people if they type a "not really wrong url".
For example I have a good URL http://www.website.com/category/foo-bar-if-bar-foo/.
This one works so if a user enter to my website with it, I can retrieve the article corresponding.
But if someone enter to my website with a not really wrong url like http://www.website.com/category/foo-bar-foo/ because an another website has referenced a wrong url, I should redirect him to the right one instead of having a 404 status code...
So how should I do this? and Most important, should I do this ?
I actually use Eloquent with Laravel 4.2.
Thank you in advance.
EDIT
I was wrong about stackoverflow, thanks for your comment. It uses the unique ID of a post.
EDIT 2
I Looked at SOUNDEX function in SQL, it's really good if there is a small difference like a character or two missing. But if my url is as broken as my example, it's not working anymore obviously. But thanks it's gonna be usefull.
Just thinking off the top of my head, you could create a SQL table (with Full-Text indexing enabled) containing all your paths (it might already exist).
In the event that a 404 is triggered, hijack that and do a MATCH (Full Text Search) and return the path with the highest scoring MATCH (you can also consider using a score threshold to prevent nonsensical matches).

Route #token confusion

Apologies if this has already been asked.
I am working on a project where I am looking to display locations of a business. This can be either by state, or by city (in a state).
I am trying to work with these two routes:
GET /#state
GET /#city-#state
#state works well, but when I try to navigate to a #city-#state page, I get errors because it is trying to load the #state page, and cannot find the required data.
Looking at base.php, I found that the preg_match_all is matching with \w, so it should be ignoring the hyphen(-), but for some reason isn't.
I need the URLs to be in this structure.
Can someone help me notice what I am missing?
Thanks!
I don't think F3 allows you to use a dash to separate tokens in a url; hence why it's always matching the first token (#state).
The regex used to grab tokens is '/#(\w+)/', it wants a slash character to separate tokens.
I would suggest using /#state and /#city/#state.

RegEx extract website url from email address w/ sub-sub-domain

We are trying to extract from an email list a valid url for that organization.
abc#charleston.k12.il.us is easy, but sometimes we have
someone#u40gw.effingham.k12.il.us where the 040gw is a subdomain for internal mail.
Another example is someone#mail.meridian223.org or someone#athletics.msstate.edu
What would be the most efficient way to capture the .edu + the preceding name only, without additional subdomains, or in the case of high schools the whole part k12.il.us plus the preceding name only?
Tried so far:
/#(([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)|#([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*))/
You can try the following regex pattern:
#.*?([^.]+[.]\w{3}|[^.]+[.]k12[.]il[.]us)$
Where, you can replace \w{3} with your list of possible extensions, like org, edu, net etc. An example would be like:
#.*?([^.]+[.](edu|org|net|info|com)|[^.]+[.]k12[.]il[.]us)$
You can see it working on regexr.com

Looking for a PHP regex or function to filter variations using . of an email for security

I am getting spam due to gmail allowing the use of . in their emails, so someone like this spammer.
q.i.n.ghu.im.i.n.g.o.u.r#gmail.com
can get through by removing and/or adding another period in his naming structure.
This happens to be on a Joomla install, so I am specifically looking to create a component so I can add to multiple sites, or if there is a simple regex to add inline existing code. Also, is there anything being done about this, as this seems to be along the lines of and be newly termed a loosely typed email address.. that is crazy to me.
If your goal is to match this address against the others that are equivalent to it (because you've already got them blacklisted) then I'd simply normalize the address to it's most basic state before storing it. Lower case it, split it at the #, and if the right side is "gmail.com" then remove all dots from the left side and put the halves back together.
start with JOE.SCHMOE#GMAIL.COM
lowercase to joe.schmoe#gmail.com
split to joe.schmoe and gmail.com
since right side is gmail.com, remove dots from left
reassemble to joeschmoe#gmail.com
Now you've got the base address that you can block/ban/whatever.
You could do something simple like: /^(?:[^#]+\.){5,}[^#]+#(?:[^#]+\.)+[^#]+/
This is just quick toss up not meant for validation, but rather, a pointer to tell you if their email is scetchy. The key here is the {5,} quantifier that says if the email has 5 or more dots (like a.b.c.d.e.f) it will match. In other words be flagged as scetchy.
I hope this helps!
Explanation: http://regex101.com/r/lB5vG3

Extract URL containing /find/ from numerous URL's?

I'm really a major novice at RegEx and could do with some help.
I have a long string containing lots of URL's and other text, and one of the URL's contains has /find/ in it. ie:
1. http://www.example.com/not/index.html
2. http://www.example.com/sat/index.html
3. http://www.example.com/find/index.html
4. http://www.example.com/rat/mine.html
5. http://www.example.com/mat/find.html
What sort of RegEx would I use to return the URL that is number 3 in that list but not return me number 5 as well? I suppose basically what I'm looking for is a way of returning a whole word that contains a specific set of letters and / in order.
TIA
I would assume you want preg_match("%/find/%",$input); or similar.
EDIT: To get the full line, use:
preg_match("%^.*?/find/.*$%m",$input);
I can suggest you to use RegExr to generate regular expressions.
You can type in a sample list (like the one above) and use a palette to create a RegExp and test it in realtime. The program is available both online and as downloadable Adobe AIR package.
Unfortunately I cannot access their site now, so I'm attaching the AIR package of the downloadable version.
I really recommend you this, since it helped a RegExp newbie like me to design even the most complex patterns.
However, for your question, I think that just
\/find\/
goes well if you want to obtain a yes/no result (i.e. if it contains or not /find/), otherwise to obtain the full line use
.*\/find\/.*
In addition to Kolink's answer, in case you wanted to regex match the whole URI:
This is by no means an exhaustive regex for URIs, but this is a good starting point. I threw in a few options at key points, like .com, .net, and .org. In reality you'll have a fairly hard time matching URIs with regular expressions due to the lack of conformity, but you can come very close
The regex from the above link:
/(https?:\/\/)?(www\.)?([a-zA-Z0-9-_]+)\.(com|org|net)\/(find)\/([a-zA-Z0-9-_]+)\.(html|php|aspx)?/is

Categories