Detecting specific words in a textarea submission - php

I have a new feature on my site, where users can submit any text (I stopped all HTML entries) via a textarea. The main problem I still have though is that they could type "http://somewhere.com" which is something I want to stop. I also want to blacklist specific words. This is what I had before:
if (strpos($entry, "http://" or ".com" or ".net" or "www." or ".org" or ".co.uk" or "https://") !== true) {
die ('Entries cannot contain links!');
However that didn't work, as it stopped users from submitting any text at all. So my question is simple, how can I do it?

This is a job for Regular Expressions.
What you need to do it something like this:
// A list of words you don't allow
$disallowedWords = array(
'these',
'words',
'are',
'not',
'allowed'
);
// Search for disallowed words.
// The Regex used here should e.g. match 'are', but not match 'care' or 'stare'
foreach ($disallowedWords as $word) {
if (preg_match("/\s+$word\s+/i", $entry)) {
die("The word '$word' is not allowed...");
}
}
// This variable should contain a regex that will match URLs
// there are thousands out there, take your pick. I have just
// used an arbitrary one I found with Google
$urlRegex = '(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*';
// Search for URLs
if (preg_match($urlRegex, $entry)) {
die("URLs are not allowed...");
}

You must use strpos more the once. With your way you evaluate the or statement with returns true / false and pass it to strpos.
This way it should work:
if (strpos($entry, "http://") !== false || strpos($entry, "https://") !== false || strpos($entry, ".com") !== false)

A simple way to do this is to put all the words not allowed into an array and loop through them to check each one.
$banned = array('http://', '.com', '.net', 'www.', '.org'); // Add more
foreach ($banned as $word):
if (strpos($entry, $word) !== false) die('Contains banned word');
endforeach;
The problem with this is if you get too carried away and start banning the word 'com' or something, there are other words and phrases that could be perfectly legal that contains the letters 'com' in that way that would cause a false positive. You could use regular expressions to search for strings that look like URLs, but then you can easily just break them up like I did above. There is no effective way to completely stop people from posting links into a comment. If you don't want them there, you'll ultimately just have to use moderation. Community moderation works very well, look at Stack Overflow for instance.

Related

PHPs strpos does not work as intended with double quoted string

I'm using the following code to return true or false if a string contains a substring in PHP 8.0.
<?php
$username = "mothertrucker"; // This username should NOT be allowed
$banlistFile = file_get_contents("banlist.txt"); //Contains the word "trucker" in it
$banlist = explode("\n", $banlistFile); // Splits $banlistFile into an array, split by line
if (contains($username, $banlist)) {
echo "Username is not allowed!";
} else {
echo "Username is allowed";
}
function contains($str, array $arr)
{
foreach($arr as $a) { // For each word in the banlist
if (stripos($str, $a) !== false) { // If I change $a to 'trucker', it works. "trucker" does not
return true;
}
}
return false;
}
?>
This is to detect if an inappropriate word is used when creating a username. So for example, if someone enters the username "mothertrucker", and "trucker" is included in the ban list, I want it to deny it.
Right now with this code, If I just type in the word "trucker" as a username, it is found and blocks it. Cool. However if there's more to the string than just "trucker", it doesn't detect it. So the username "mothertrucker" is allowed.
I discovered that if I explicitly type in 'trucker' instead of $a in the stripos function, it works perfectly. However, if I explicitly type in "trucker" (with double quotes), it stop working, and only blocks if that's the only thing the user entered.
So what I'm seeing, is it looks like the string $a that I'm passing it is being interpreted by PHP as a double quoted string, when in order for this to detect it properly, it needs to be a single quoted string. But as far as I can tell, I have no control over how php passes passing the variable.
Can I somehow convert it to a single quoted string? Perhaps the explode command I'm using in line 2 is causing it? Is there another way I can pull the data from a txt document and have it be interpreted as a single quote string? Hopefully I'm made sense with my explanation, but you can copy and paste the code and see it for yourself
Thanks for any help!
One potential problem would be any whitespace (which includes things like \r) could stop the word matching, so just trimming the word to compare with can tidy that up...
stripos($str, $a)
to
stripos($str, trim($a))
I do not know what your file actually contains so i dont know what the result of explode is.
Anyways my suggestion is (depending on the speed you want to perform this and also the length of the banlist file also your level of banning) to not explode the file and just look into it as a whole.
<?php
$username = "allow"; // This username should be allowed
$banlist = "trucker\nmotherfucker\n donot\ngoodword";
var_dump(contains($username, $banlist));
function contains($str, $arr)
{
if (stripos($arr, $str) !== false) return true;
else return false;
}
?>
Otherwise if you are going to allow say good which is an allowed word but since it is in the file with goodword it will not (using my example), you should not use stripos but instead use your example and use strcasecmp

How can I str_replace partially in PHP in a dynamic string with unknown key content

Working in WordPress (PHP). I want to set strings to the database like below. The string is translatable, so it could be in any language keeping the template codes. For the possible variations, I presented 4 strings here:
<?php
$string = '%%AUTHOR%% changed status to %%STATUS_new%%';
$string = '%%AUTHOR%% changed status to %%STATUS_oldie%%';
$string = '%%AUTHOR%% changed priority to %%PRIORITY_high%%';
$string = '%%AUTHOR%% changed priority to %%PRIORITY_low%%';
To make the string human-readable, for the %%AUTHOR%% part I can change the string like below:
<?php
$username = 'Illigil Liosous'; // could be any unicode string
$content = str_replace('%%AUTHOR%%', $username, $string);
But for status and priority, I have different substrings of different lengths.
Question is:
How can I make those dynamic substring be replaced on-the-fly so that they could be human-readable like:
Illigil Liosous changed status to Newendotobulous;
Illigil Liosous changed status to Oldisticabulous;
Illigil Liosous changed priority to Highlistacolisticosso;
Illigil Liosous changed priority to Lowisdulousiannosso;
Those unsoundable words are to let you understand the nature of a translatable string, that could be anything other than known words.
I think I can proceed with something like below:
<?php
if( strpos($_content, '%%STATUS_') !== false ) {
// proceed to push the translatable status string
}
if( strpos($_content, '%%PRIORITY_') !== false ) {
// proceed to push the translatable priority string
}
But how can I fill inside those conditionals efficiently?
Edit
I might not fully am clear with my question, hence updating the query. The issue is not related to array str_replace.
The issue is, the $string that I need to detect is not predefined. It would come like below:
if($status_changed) :
$string = "%%AUTHOR%% changed status to %%STATUS_{$status}%%";
else if($priority_changed) :
$string = "%%AUTHOR%% changed priority to %%PRIORITY_{$priority}%%";
endif;
Where they will be filled dynamically with values in the $status and $priority.
So when it comes to str_replace() I will actually use functions to get their appropriate labels:
<?php
function human_readable($codified_string, $user_id) {
if( strpos($_content, '%%STATUS_') !== false ) {
// need a way to get the $status extracted from the $codified_string
// $_got_status = ???? // I don't know how.
get_status_label($_got_status);
// the status label replacement would take place here, I don't know how.
}
if( strpos($_content, '%%PRIORITY_') !== false ) {
// need a way to get the $priority extracted from the $codified_string
// $_got_priority = ???? // I don't know how.
get_priority_label($_got_priority);
// the priority label replacement would take place here, I don't know how.
}
// Author name replacement takes place now
$username = get_the_username($user_id);
$human_readable_string = str_replace('%%AUTHOR%%', $username, $codified_string);
return $human_readable_string;
}
The function has some missing points where I currently am stuck. :(
Can you guide me a way out?
It sounds like you need to use RegEx for this solution.
You can use the following code snippet to get the effect you want to achieve:
preg_match('/%%PRIORITY_(.*?)%%/', $_content, $matches);
if (count($matches) > 0) {
$human_readable_string = str_replace("%%PRIORITY_{$matches[0]}%%", $replace, $codified_string);
}
Of course, the above code needs to be changed for STATUS and any other replacements that you require.
Explaining the RegEx code in short it:
/
The starting of any regular expression.
%%PRIORITY_
Is a literal match of those characters.
(
The opening of the match. This is going to be stored in the third parameter of the preg_match.
.
This matches any character that isn't a new line.
*?
This matches between 0 and infinite of the preceding character - in this case anything. The ? is a lazy match since the %% character will be matched by the ..
Check out the RegEx in action: https://regex101.com/r/qztLue/1

How would I replace all question marks after the first

So, a lot of my form systems redirect back to the previous page, although, they display a message in the process. The way I display a message is by simply using ?message=messageCode in the URL.
However, if they use the form from a page that already has a query string, it adds a second query string, messing everything up.
Example:
if I were to login from the navigation bar, on the URL "mywebsite.com/something?message=1", it would log in, but redirect to "mywebsite.com/something?message=1?message=2"
This results in no message being displayed.
What I am asking here is, how could I change all of the question marks AFTER the first question mark, to and signs?
Example:
From: mywebsite.com/page?blah=1?something=2?hi=3
To: mywebsite.com/page?blah=1&something=2&hi=3
I have searched around, as well as tried some methods of my own, but nothing seems to work properly.
What you should be doing is build a proper URL, appending ? or & when appropriate.
$url = 'mywebsite.com/something?message=1';
$new_url = sprintf('%s%s%s',
$url,
strpos($url, '?') === false ? '?' : '&',
http_build_query(['message' => 2])
);
Or, first parse the previous URL and merge the query string.
Use it like below:-
<?php
$str = 'mywebsite.com/something?message=1?message=2';
$pos = strpos($str,'?'); //check the last occurrence of ?
if ($pos !== false) {
$str = substr($str,0,$pos+1) . str_replace('?','&',substr($str,$pos+1));// replacement of ? to &
}
echo $str;
?>
Output:- https://eval.in/388308

Should I use strstr() to loosely validate urls before passing them to preg_match()?

I am writing a function to parse some videosites urls in order to generate embedding html:
if (strstr($url, 'a.com')) {
$from = 'a';
} elseif (strstr($url, 'b.com')) {
$from = 'b';
} else {
return 'Wrong Video Url!';
}
if ($from == 'a') {
// use preg_match() to retrieve video id to generate embedding html
if (preg_match('#^http://a\.com/id_(\w*?)\.html$#', $url, $matches)) {
// return video embedding html
}
return 'Wrong a.com Video Url!';
}
if ($from == 'b') {
if (preg_match('#^http://b\.com/v_(\w*?)\.html$#', $url, $matches)) {
//return video embedding html
}
return 'Wrong b.com Video Url!';
}
My purpose of using strstr() is reducing calls of preg_match() in some situations, for example if I have b.com urls like this: http://www.b.com/v_OTQ2MDE4MDg.html, I don't have to call preg_match() twice.
But I am still not sure if this kind of practice is good or if there is a better way.
Why not just do an alternation? (At least in this case.)
'#^http://(?:a\.com/id|b\.com/v)_(\w*?)\.html$#'
That's one preg_match, and zero strstrs.
Also, not that it is a big danger in this case, but escaping the dots when they should be dots is generally a good idea; your regexp will match "http://bacom/v_id_xhtml" (with "id_" captured by (\w*?)).
If you can't make "one pattern that fits all" (and it's actually a bad idea if you have many options, because your legibility goes down the drain), use a pattern to extract the site name, then do a switch on it. It will then just be two preg_matches, and zero strstrs, no matter how many patterns you have.

Only execute script if entered email is from a specific domain

I am trying to create a script that will only execute its actions if the email address the user enters is from a specific domain. I created a regex that seems to work when testing it via regex utility, but when its used in my PHP script, it tells me that valid emails are invalid. In this case, I want any email that is from #secondgearsoftware.com, #secondgearllc.com or asia.secondgearsoftware.com to echo success and all others to be rejected.
$pattern = '/\b[A-Z0-9\._%+-]+#((secondgearsoftware|secondgearllc|euro\.secondgearsoftware|asia\.secondgearsoftware)+\.)+com/';
$email = urldecode($_POST['email']);
if (preg_match($pattern, $email))
{
echo 'success';
}
else
{
echo 'opposite success';
}
I am not really sure what's futzed with the pattern. Any help would be appreciated.
Your regular expression is a bit off (it will allow foo#secondgearsoftwaresecondgearsoftware.com) and can be simplified:
$pattern = '/#((euro\.|asia\.)?secondgearsoftware|secondgearllc)\.com$/i';
I've made it case-insensitive and anchored it to the end of the string.
There doesn't seem to be a need to check what's before the "#" - you should have a proper validation routine for that if necessary, but it seems you just want to check if the email address belongs to one of these domains.
You probably need to use /\b[A-Z0-9\._%+-]+#((euro\.|asia\.)secondgearsoftware|secondgearllc)\.com/i (note the i at the end) in order to make the regex case-insensitive. I also dropped the +s as they allow for infinite repetition which doesn't make sense in this case.
Here's an easy to maintain solution using regular expressions
$domains = array(
'secondgearsoftware',
'secondgearllc',
'euro\.secondgearsoftware',
'asia\.secondgearsoftware'
);
preg_match("`#(" .implode("|", $domains). ")\.com$`i", $userProvidedEmail);
Here's a couple of tests:
$tests = array(
'bob#secondgearsoftware.com',
'bob#secondgearllc.com',
'bob#Xsecondgearllc.com',
'bob#secondgearllc.net',
'bob#euro.secondgearsoftware.org',
'bob#euro.secondgearsoftware.com',
'bob#euroxsecondgearsoftware.com',
'bob#asia.secondgearsoftware.com'
);
foreach ( $tests as $test ) {
echo preg_match("`#(" .implode("|", $domains). ")\.com$`i", $test),
" <- $test\n";
}
Result (1 is passing of course)
1 <- bob#secondgearsoftware.com
1 <- bob#secondgearllc.com
0 <- bob#Xsecondgearllc.com
0 <- bob#secondgearllc.net
0 <- bob#euro.secondgearsoftware.org
1 <- bob#euro.secondgearsoftware.com
0 <- bob#euroxsecondgearsoftware.com
1 <- bob#asia.secondgearsoftware.com
I suggest you drop the regex and simply use stristr to check if it matches. Something like this should work:
<?php
// Fill out as needed
$domains = array('secondgearsoftware.com', 'secondgearllc.com');
$email = urldecode($_POST['email']);
$found = false;
for(i=0;i<count($domains);i++)
{
if ($domains[i] == stristr($email, $domains[i]))
$found = true;
}
if ($found) ...
?>
The function stristr returns the e-mail address from the part where it found a match to the end, which should be the same as the match in this case. Technically there could be something prior to the domains (fkdskjfsdksfks.secondgeartsoftware.com), but you can just insert "#domainneeded.com" to prevent this. This code is also slightly longer, but easily extended with new domains without worrying about regex.

Categories