Specific regex pattern to stop advertisement - php

I'm using the php function preg_match() to check for a specific pattern.
I've gone through about 50 websites so far, and still haven't figured out how to do this specific pattern.
If there is the word "dot" or "d0t" after anything and before anything that contains "com" or "org", it would catch it.
I'm making something that filters out advertisement, it separately filters out anything except for letters, numbers, and underscores; that has to stay separate. It has it's own purpose, and it's own output.
If you can help me figure out how to do this, or link me to anything that I may have missed after 2 hours of googling, I would so greatly appreciate it.
Thanks.

Question is not very clear but you can try this regex for preg_match:
'~d[0o]t.*?(?:com|org)~i'
This matches word "dot" or "d0t" before anything that contains "com" or "org"

Related

preg_match regex syntax

Tried different regex generators with no luck.
I have this string that i put in preg_match:
$search_string = "/^:([A-Za-z0-9_\-]+)[#!~a-zA-Z0-9#\.\-]+\s*([A-Z]+)\s*[:]*([\#a-zA-Z0-9\-]+)*\s*[:]*([!\#\-\.A-Za-z0-9 ]+)*/";
It's basically for usernames. Sadly, when username has underscore in it. For example iam_coolguy wouldn't work.
How to add underscore to this search string?
I can't seem to figure out how regex works.
It's not a duplicate, scrolled past all preg_match threads.
/[a-z]/i seems easy and understandable for me, but my string is too advanced for my knowledge.
Thanks.
If you are just looking to grab somthing between //'s
I would just use this regex \/(.*)\/
but as the others have said, you havent given any limitations on what the username can and can't have in it.
If you need more, say something and I will adjust my answer.

Basic Regular Expression for

For some reason I always get stuck making anything past extremely basic regular expressions.
I'm trying to make a regular expression that kind of looks like a URL. I only want basic checking.
I would like it to match the following patterns where X is "something".
X://X.X
X://X.X... etc.
X.X
X.X... etc
If the string contains one of these patterns, it is sufficient checking for me. This way a url like www.example.com:8888 will still match. I have tried many different REGEX combinations with preg_match and cannot seem to get any to behave the way I want it to. I have consulted many other related REGEX questions on SO but my readings have not helped me.
Any help? I will be happy to provide more information if you would like but I don't know what else you would need.
It takes practice but here is one that I made using a regex tester (http://www.regextester.com/) to check my pattern:
^.+(:\/\/|\.)([a-zA-Z0-9]+\.)+.+
My approach is to slowly build my pattern from the beginning and add on one piece at a time. This cheatsheet is extremely helpful for remembering http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ what everything is.
Basically the pattern starts at the beginning of the string and checks for any characters followed by either :// or . then checks for groupings of letters and numbers followed by a . ending with any number of characters.
The pattern could probably be improved with groupings to not pass on invalid characters. But this one was quick and dirty. You could replace the first and last . with the characters that would be valid.
UPDATE
Per the comments here is an updated pattern:
^.+?(:\/\/|\.)?([a-zA-Z0-9]+?\.)+.+
/^(.+:\/\/)?[^.]+\.[^.\/]+([.\/][^.\/]+)*$/

Validate the name of a person in php [duplicate]

I would like to create a regex which validates a name of a person. These should be allowed:
Letters (uppercase and lowercase)
-
spaces
This is pretty easy to create a regex for. The problem is that some people also use special characters in their names. For example, assume a user named gûnther or François. There are a lot of characters like û and ç available and it's hard to list all of these.
Is there an easy way to check for correct human names?
Is there an easy way to check for correct human names?
This has been discussed several times. I'm fairly certain that the only thing that people can agree on is that in order to exist a name cannot be a empty string, thus:
^.+$
(Yes, I am aware that this is probably not what OP is looking for. I'm just summarizing earlier Q&As.)
/^\pL[\pL '-]*\z/ should do the trick
The short answer is no, there is no easy way. You have touched on the biggest issue. There are so many special cases of accents and extra things hanging of letters that it will become a mess to deal with. Additionally, the expression with break down to something like this
^[CAPITAL_LETERS][ALL_LETERS_AND_SYMBOLS]*$
That is not that helpful because "Abcd" fits that and you have no way to know if someone is incorrectly entering info into the field or if it was a crazy Hollywood parent that actually named their kid that or something like Sandwich or Umbrella.
^.+$
Checked #jensgram answer, but that regex only accepts all strings, so it doesn't solve problem, because string needs to be name, in this case it can be anything.
^[A-Z][a-z]+$
My regex only accepts string where first char is uppercase and following chars are letters in lowercase. Also looking through other answers, this seems to be shortest regex and also simpliest.
I don't know exactly what you are trying to do (validate user name input?) but basically I would keep it simple - fail the validation if the text contains numbers. And even that's probably pretty shaky.
I had the same problem. First I came up with something like
preg_match("/^[a-zA-Z]{1,}([\s-]*[a-zA-Z\s\'-]*)$/", $name))
but then realized that UTF-8 chars of countries like Sweden, China etc. for example Õ å would not be allowed which was important to my site since it's an international site and don't want to force users not being able to enter their real name.
I though it might be an easier solution instead of trying to figure out how to allow names like O'Malley and Brooks-Schneider and Õsmar (made that one up :) to rather catch chars that you don't want them to enter. For me it was basically to avoid xss JS code being entered. So I use the following regex to filter out all chars that might be harmful.
preg_match("/[~!##\$%\^&\*\(\)=\+\|\[\]\{\};\\:\",\.\<\>\?\/]+/", $name)
That way they can enter any name they want except chars that really aren't part of any name. Hope this might be useful.

Confused about the behavior of regex in a url routing script

I just finished learning about regex and I thought that I should put it into something useful, so I created a small url routing script with php and the following regex:
^(?:/(\w+)?)*$
(the php code currently doesnt do anything, just prints out the matching groups from preg_match)
currently if given the url /foobar/foo/bar, the matching groups are the entire string (normal behavior) and the last part of the url (in this case: bar).
Obviously, this is a problem.
I think that this is caused because of the use of 1 capture group, which only captures the last matching string, but I'm not sure. any advice on the real cause of this and/or a solution to this will be greatly appreciated.
Thanks in advance!
You have diagnosed the problem correctly - on each repetition of the surrounding group, the previously matched contents of the capturing group are "overwritten" by the new match.
It's not quite clear what you would have expected to happen. I guess that you would have liked each part of the path to be "remembered" as its own group? This is something you can't do with repeated groups in PHP (only a few regex dialects (Perl 6 and .NET) allow something like this).
In your case, you're probably better off by using your regex to validate the URL and then split it along the slashes:
$result = preg_split('%/%', $subject);

best method to stop users posting urls

I am looking to implement a system to strip out url's from text posted by a user.
I know there is no perfect solution and users will still attempt things like:
www dot google dot com
so I know that ultimately any solution will be flawed in some way... all I am looking to do really is reduce the number of people doing it.
Any suggestions, source or approaches appriciated,
Thanks
There are number of regular expression pattern matchers here. Some of them are quite complex.
I would suggest that running multiple ones may be a good idea.
You need to define exactly what you want to strip out. The stricter the definition, the more false positives you will get. The following example will remove any string with 3 characters, followed by a period, more letters, another period and 2-4 more letters:
$text = preg_replace('/[a-z]{3}\.[a-z]+\.[a-z]{2,4}/i', '', $text);
The other end of strictness might be anything that ends on a period and 2-4 letters (like .com):
$text = preg_replace('/[a-z]+\.[a-z]{2,4}/i', '', $text);
Note that the latter will strip out the last word of a sentence, the full stop and the first word of the next sentence if someone forgets to add a space inbetween the sentences.

Categories