How to detect usernames in a comment using preg_match?

How to detect usernames in a comment using preg_match? - php

I am trying to design a comment/reply system like the one in stackoverflow where if #username is mentioned in a comment then a notification is send to him.
As an example take the comment
$comment="hello #myname and #my-name and #my+name and #my%name and #my&name and #my_name and #my name #my/name and #3535 and #12";
the problem is my code
if(preg_match('~#([^\s]+)~', $comment, $matches)){
print_r($matches);
}
only finds the username #myname. Is there a way to fix this so that it detects all usernames?
Also, which of the usernames mentioned in the comment above are valid usernames in stackoverflow for example are my-name, my%name valid usernames and are they detected when they are mensioned in a stackoverflow comment.
Finally, is it possible to replace every valid username in my comment example by <strong>username</strong>?

The problem with your code is that the preg_match() function finds the first matched pattern and returns true or false without moving ahead along the rest of the string. So it won't go through the next usernames.
For this wrapping the preg_match() conditional in a loop can be a good deal.
This code should get it done!
$comment="hello #myname and #my-name and #my+name and #my%name and #my&name and #my_name and #my name #my/name and #3535 and #12";
$comment_arr = explode(' ', $comment);
// echo '<pre>';
// print_r($comment_arr);
// echo '</pre>';
$usernames = [];
$new_comment_arr = [];
for ($i=0; $i < count($comment_arr) ; $i++)
{
if( preg_match('/^#(.*)/', $comment_arr[$i]) )
{
array_push($usernames, $comment_arr[$i]); // push the usernames
array_push($new_comment_arr, '<strong>'.$comment_arr[$i].'</strong>'); // push the usernames with '<strong>' wrapped around in the new comments array
}
else
array_push($new_comment_arr, $comment_arr[$i]); // push the unmatched words(other words) in the new comments array
}
echo '<pre>';
print_r($new_comment_arr);
print_r($usernames);
echo '</pre>';
$new_comment = implode(' ', $new_comment_arr); // implode the new array
echo $new_comment; // the new comment with '<strong>' wrapped around the usernames
The username #my name shouldn't be allowed.
In some cases if you want the username to be in URL then such a username gets converted to #my%20name.
Also do not allow '/' in a username as, if you Rewrite the URL, it will be treated as an argument and can lead to 404 Errors.
As far as I'm concerned, you should allow only letters, numbers and underscores( '_' ) in a username.

Why not try Social Plugin for comments
I suggest you to use Facebook plugin for comments
for more details http://developers.facebook.com/docs/plugins/comments

I think you must collect all the name that started sit char # in array so you can use the array for everything you want like send notification to all of them by loop the array.
I have made a code to accommodate it.
<?php
$comment="hello #myname and #my-name and #my+name and #my%name and #my&name and #my_name and #my name #my/name and #3535 and #12";
$keywords = preg_split("/[\s]+/", $comment);
foreach($keywords as $row=>$value){
if(preg_match("/^#/",$value)==0){
unset($keywords[$row]);
}
}
print_r($keywords);
?>

Related

PHP mention system with usernames with space

I wanted to know if it's possible to make a PHP mention system with usernames with space ?
I tried this
preg_replace_callback('##([a-zA-Z0-9]+)#', 'mentionUser', htmlspecialchars_decode($r['content']))
My function:
function mentionUser($matches) {
global $db;
$req = $db->prepare('SELECT id FROM members WHERE username = ?');
$req->execute(array($matches[1]));
if($req->rowCount() == 1) {
$idUser = $req->fetch()['id'];
return '<a class="mention" href="members/profile.php?id='.$idUser.'">'.$matches[0].'</a>';
}
return $matches[0];
It works, but not for the usernames with space...
I tried to add \s, it works, but not well, the preg_replace_callback detect the username and the other parts of the message, so the mention don't appear...
Is there any solution ?
Thanks !

I know you said that you just removed the ability to add a space, but I still wanted to post a solution. To be clear, I don't necessarily think you should use this code, because it probably is just easier to keep things simple, but I think it should work still.
Your major problem is that almost every mention will incur two lookups because #bob johnson went to the store could be either bob or bob johnson and there's no way to determine that without going to the databases. Caching will greatly reduce this problem, luckily.
Below is some code that generally does what you are looking for. I made a fake database using just an array for clarity and reproducibility. The inline code comments should hopefully make sense.
function mentionUser($matches)
{
// This is our "database" of users
$users = [
'bob johnson',
'edward',
];
// First, grab the full match which might be 'name' or 'name name'
$fullMatch = $matches['username'];
// Create a search array where the key is the search term and the value is whether or not
// the search term is a subset of the value found in the regex
$names = [$fullMatch => false];
// Next split on the space. If there isn't one, we'll have an array with just a single item
$maybeTwoParts = explode(' ', $fullMatch);
// Basically, if the string contained a space, also search only for the first item before the space,
// and flag that we're using a subset
if (count($maybeTwoParts) > 1) {
$names[array_shift($maybeTwoParts)] = true;
}
foreach ($names as $name => $isSubset) {
// Search our "database"
if (in_array($name, $users, true)) {
// If it was found, wrap in HTML
$ret = '<span>#' . $name . '</span>';
// If we're in a subset, we need to append back on the remaining string, joined with a space
if ($isSubset) {
$ret .= ' ' . array_shift($maybeTwoParts);
}
return $ret;
}
}
// Nothing was found, return what was passed in
return '#' . $fullMatch;
}
// Our search pattern with an explicitly named capture
$pattern = '##(?<username>\w+(?:\s\w+)?)#';
// Three tests
assert('hello <span>#bob johnson</span> test' === preg_replace_callback($pattern, 'mentionUser', 'hello #bob johnson test'));
assert('hello <span>#edward</span> test' === preg_replace_callback($pattern, 'mentionUser', 'hello #edward test'));
assert('hello #sally smith test' === preg_replace_callback($pattern, 'mentionUser', 'hello #sally smith test'));

Try this RegEx:
/#[a-zA-Z0-9]+( *[a-zA-Z0-9]+)*/g
It will find an at sign first, and then try to find one or more letter or numbers. It will try to find zero or more inner spaces and zero or more letters and numbers coming after that.
I am assuming the username only contains A-Za-z0-9 and space.

PHP performant search a text for given usernames

I am currently dealing with a performance issue where I cannot find a way to fix it. I want to search a text for usernames mentioned with the # sign in front. The list of usernames is available as PHP array.
The problem is usernames may contain spaces or other special characters. There is no limitation for it. So I can't find a regex dealing with that.
Currently I am using a function which gets the whole line after the # and checks char by char which usernames could match for this mention, until there is just one username left which totally matches the mention. But for a long text with 5 mentions it takes several seconds (!!!) to finish. for more than 20 mentions the script runs endlessly.
I have some ideas, but I don't know if they may work.
Going through username list (could be >1.000 names or more) and search for all #Username without regex, just string search. I would say this would be far more inefficient.
Checking on writing the usernames with JavaScript if space or punctual sign is inside the username and then surround it with quotation marks. Like #"User Name". Don't like that idea, that looks dirty for the user.
Don't start with one character, but maybe 4. and if no match, go back. So same principle like on sorting algorithms. Divide and Conquer. Could be difficult to implement and will maybe lead to nothing.
How does Facebook or twitter and any other site do this? Are they parsing the text directly while typing and saving the mentioned usernames directly in the stored text of the message?
This is my current function:
$regular_expression_match = '#(?:^|\\s)#(.+?)(?:\n|$)#';
$matches = false;
$offset = 0;
while (preg_match($regular_expression_match, $post_text, $matches, PREG_OFFSET_CAPTURE, $offset))
{
$line = $matches[1][0];
$search_string = substr($line, 0, 1);
$filtered_usernames = array_keys($user_list);
$matched_username = false;
// Loop, make the search string one by one char longer and see if we have still usernames matching
while (count($filtered_usernames) > 1)
{
$filtered_usernames = array_filter($filtered_usernames, function ($username_clean) use ($search_string, &$matched_username) {
$search_string = utf8_clean_string($search_string);
if (strlen($username_clean) == strlen($search_string))
{
if ($username_clean == $search_string)
{
$matched_username = $username_clean;
}
return false;
}
return (substr($username_clean, 0, strlen($search_string)) == $search_string);
});
if ($search_string == $line)
{
// We have reached the end of the line, so stop
break;
}
$search_string = substr($line, 0, strlen($search_string) + 1);
}
// If there is still one in filter, we check if it is matching
$first_username = reset($filtered_usernames);
if (count($filtered_usernames) == 1 && utf8_clean_string(substr($line, 0, strlen($first_username))) == $first_username)
{
$matched_username = $first_username;
}
// We can assume that $matched_username is the longest matching username we have found due to iteration with growing search_string
// So we use it now as the only match (Even if there are maybe shorter usernames matching too. But this is nothing we can solve here,
// This needs to be handled by the user, honestly. There is a autocomplete popup which tells the other, longer fitting name if the user is still typing,
// and if he continues to enter the full name, I think it is okay to choose the longer name as the chosen one.)
if ($matched_username)
{
$startpos = $matches[1][1];
// We need to get the endpos, cause the username is cleaned and the real string might be longer
$full_username = substr($post_text, $startpos, strlen($matched_username));
while (utf8_clean_string($full_username) != $matched_username)
{
$full_username = substr($post_text, $startpos, strlen($full_username) + 1);
}
$length = strlen($full_username);
$user_data = $user_list[$matched_username];
$mentioned[] = array_merge($user_data, array(
'type' => self::MENTION_AT,
'start' => $startpos,
'length' => $length,
));
}
$offset = $matches[0][1] + strlen($search_string);
}
Which way would you go? The problem is the text will be displayed often and parsing it every time will consume a lot of time, but I don't want to heavily modify what the user had entered as text.
I can't find out what's the best way, and even why my function is so time consuming.
A sample text would be:
Okay, #Firstname Lastname, I mention you!
Listen #[TEAM] John, you are a team member.
#Test is a normal name, but #Thât♥ should be tracked too.
And see #Wolfs garden! I just mean the Wolf.
Usernames in that text would be
Firstname Lastname
[TEAM] John
Test
Thât♥
Wolf
So yes, there is clearly nothing I know where a name may end. Only thing is the newline.

I think the main problem is, that you can't distinguish usernames from text and it's a bad idea, to lookup maybe thousands of usernames in a text, also this can lead to further problems, that John is part of [TEAM] John‌ or JohnFoo...
It's needed to separate the usernames from other text. Assuming that you're using UTF-8, could put the usernames inside invisible zero-w space \xE2\x80\x8B and non-joiner \xE2\x80\x8C.
The usernames can now be extracted fast and with little effort and if needed still verified in db.
$txt = "
Okay, #\xE2\x80\x8BFirstname Lastname\xE2\x80\x8C, I mention you!
Listen #\xE2\x80\x8B[TEAM] John\xE2\x80\x8C, you are a team member.
#\xE2\x80\x8BTest\xE2\x80\x8C is a normal name, but
#\xE2\x80\x8BThât?\xE2\x80\x8C should be tracked too.
And see #\xE2\x80\x8BWolfs\xE2\x80\x8C garden! I just mean the Wolf.";
// extract usernames
if(preg_match_all('~#\xE2\x80\x8B\K.*?(?=\xE2\x80\x8C)~s', $txt, $out)){
print_r($out[0]);
}
Array
(
[0] => Firstname Lastname
1 => [TEAM] John
2 => Test
3 => Thât♥
4 => Wolfs
)
echo $txt;
Okay, #Firstname Lastname, I mention you!
Listen #[TEAM] John‌, you are a team member.
#Test‌ is a normal name, but
#Thât♥‌ should be tracked too.
And see #Wolfs‌ garden! I just mean the Wolf.
Could use any characters you like and that possibly don't occur elsewhere for separation.
Regex FAQ, Test at eval.in (link will expire soon)

How can I validate on server side a input text where I have tags separated by commas?

The user will input on a text field something like this:
$list = "hello, internet, connection, wireless"
I'm getting them out with the following:
$tags = explode(',', $list);
The problem is that before I do that, I need to validate whether the format of the string captured in the $_POST actually has that format. Any ideas how can I do that? I'm not an experienced PHP programmer, I usually just work on front end development.
The user can input more than 4 tags, it's not always going to be 4, he can put 5 or 6 with a maximum of 7, the output I want is to have on an array, all the words in the string, the background for this is that I'm making a simple helpdesk, where users can ask a question, and I will return all the results according to the found tags in the question the user has input.
So I really need to make sure that when the ADMIN of this helpdesk is typing the tags, they're on that format:
$data = "internet, connection, wireless, internet-explorer";

I depends if the contents everytime 4 Elements.
<?php
$list = "hello, internet, connection, wireless";
echo substr_count($list,', '); // return 3 commas
?>
or you count the elements:
<?php
$list = "hello, internet, connection, wireless";
$result = explode(', ',$list);
if(count($result) ==4 ){
echo "right format";
}
?>
Edit:
When you don't need the whitespaces:
<?php
$list = "hello, internet, connection, wireless";
$result = array_map('trim',explode(",",$list));
if(count($result) ==4 ){
echo "right format";
}
?>

Additional elements to URLS?

I'm not sure what the terminology is, but basically I have a site that uses the "tag-it" system, currently you can click on the tags and it takes the user to
topics.php?tags=example
My question is what sort of scripting or coding would be required to be able to add additional links?
topics.php?tags=example&tags=example2
or
topics.php?tags=example+example2
Here is the code in how my site is linked to tags.
header("Location: topics.php?tags={$t}");
or
<?php echo strtolower($fetch_name->tags);?>
Thanks for any hints or tips.

You cannot really pass tags two times as a GET parameter although you can pass it as an array
topics.php?tags[]=example&tags[]=example2
Assuming this is what you want try
$string = "topics.php?";
foreach($tags as $t)
{
$string .= "tag[]=$t&";
}
$string = substr($string, 0, -1);
We iterate through the array concatenating value to our $string. The last line removes an extra & symbol that will appear after the last iteration
There is also another option that looks a bit more dirty but might be better depending on your needs
$string = "topics.php?tag[]=" . implode($tags, "&tag[]=");
Note Just make sure the tags array is not empty

topics.php?tags=example&tags=example2
will break in the back end;
you have to assign the data to one variable:
topics.php?tags=example+example2
looks good you can access it in the back end explode it by the + sign:
//toplics.php
<?php
...
$tags = urlencode($_GET['tags']);
$tags_arr = explode('+', $tags); // array of all tags
$current_tags = ""; //make this accessible in the view;
if($tags){
$current_tags = $tags ."+";
}
//show your data
?>
Edit:
you can create the fron-end tags:
<a href="topics.php?tags=<?php echo $current_tags ;?>horror">
horror
</a>

Can I add variable name within a string?

I am creating an OpenCart extension where the admin can change his email templates using the user interface in the admin panel.
I would like the user to have the option to add variables to his custom email templates. For example he could put in:
Hello $order['customer_firstname'], your order has been processed.
At this point $order would be undefined, the user is simply telling defining the message that is to be sent. This would be stored to the database and called when the email is to be sent.
The problem is, how do I get "$order['customer_firstname']" to become a litteral string, and then be converted to a variable when necessary?
Thanks
Peter

If I understand your question correctly, you could do something like this:
The customer has a textarea or similar to input the template
Dear %NAME%, blah blah %SOMETHING%
Then you could have
$values = array('%SOMETHING%' => $order['something'], '%NAME%' => $order['name']);
$str = str_replace(array_keys($values), array_values($values), $str);

the user will be using around 40 variables. Is there a way I can set it to do that for each "%VARIABLE%"?
Yes, you can do so for each variable easily with the help of a callback function.
This allows you, to process each match with a function of your choice, returning the desired replacement.
$processed = preg_replace_callback("/%(\S+)%/", function($matches) {
$name = $matches[1]; // between the % signs
$replacement = get_replacement_if_valid($name);
return $replacement;
},
$text_to_replace_in
);
From here, you can do anything you like, dot notation, for example:
function get_replacement_if_valid($name) {
list($var, $key) = explode(".", $name);
if ($var === "order") {
$order = init_oder(); // symbolic
if(array_key_exists($key, $order)) {
return $order[$key];
}
}
return "<invalid key: $name>";
}
This simplistic implementation allows you, to process replacements such as %order.name% substituting them with $order['name'].

You could define your own simple template engine:
function template($text, $context) {
$tags = preg_match_all('~%([a-zA-Z0-9]+)\.([a-zA-Z0-9]+)%~', $text, $matches);
for($i = 0; $i < count($matches[0]); $i++) {
$subject = $matches[0][$i];
$ctx = $matches[1][$i];
$key = $matches[3][$i];
$value = $context[$ctx][$key];
$text = str_replace($subject, $value, $text);
}
return $text;
}
This allows you to transform a string like this:
$text = 'Hello %order.name%. You have %order.percent%% discount. Pay a total ammount of %payment.ammount% using %payment.type%.';
$templated = template($text, array(
'order' => array(
'name' => 'Alex',
'percent' => 20
),
'payment' => array(
'type' => 'VISA',
'ammount' => '$299.9'
)
));
echo $templated;
Into this:
Hello Alex. You have 20% discount. Pay a total ammount of $299.9 using VISA.
This allows you to have any number of variables defined.

If you want to keep the PHP-syntax, then a regex would be appropriate to filter them:
$text = preg_replace(
"/ [$] (\w+) \[ '? (\w+) \'? \] /exi",
"$$1['$2']", # basically a constrained eval
$text
);
Note that it needs to be executed in the same scope as $order is defined. Else (and preferrably) use preg_replace_callback instead for maximum flexibility.
You could also allow another syntax this way. For example {order[customer]} or %order.customer% is more common and possibly easier to use than the PHP syntax.

You can store it as Hello $order['customer_firstname'] and while accessing make sure you have double-quotes "" to convert the variable to its corresponding value.
echo "Hello $order['customer_firstname']";
Edit: As per the comments, a variation to Prash's answer,
str_replace('%CUSTOMERNAME%', $order['customer_name'], $str);

What you're looking for is:
eval("echo \"" . $input . "\";");
but please, PLEASE don't do that, because that lets the user run any code he wants.
A much better way would be a custom template-ish system, where you provide a list of available values for the user to drop in the code using something like %user_firstname%. Then, you can use str_replace and friends to swap those tags out with the actual values, but you can still scan for any sort of malicious code.
This is why Markdown and similar are popular; they give the user control over presentation of his content while still making it easy to scan for HTML/JS/PHP/SQL injection/anything else they might try to sneak in, because whitelisting is easier than blacklisting.

Perhaps you can have a template like this:
$tpl = "Hello {$order['customer_firstname']}, your order has been processed.".
If $order and that specific key is not null, you can use echo $tpl directly and show the content of 'customer_firstname' key in the text. The key are the curly braces here.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to detect usernames in a comment using preg_match? - php

Why not try Social Plugin for comments I suggest you to use Facebook plugin for comments for more details http://developers.facebook.com/docs/plugins/comments

Related

PHP mention system with usernames with space

PHP performant search a text for given usernames

How can I validate on server side a input text where I have tags separated by commas?

Additional elements to URLS?

Can I add variable name within a string?

Categories

Resources