PHP regex hashtag and special chars

PHP regex hashtag and special chars - php

ok, not sure if stupid or just monday.
It's actually quite simple. I have a textbox, in which I enter Text. A word gets marked with a hash (#), which then gets saved to the DB as the hashtag for that sentence.
Now, my funciton looks like this:
public function getHashtag($text)
{
print_r($text);
preg_match_all('/(#\w+)/', $text, $hashTag);
print_r($hashTag);
die();
if (isset($hashTag[0][0])) {
$hashTag = $hashTag[0][0];
return $hashTag;
} else {
return '';
}
}
the print_r are just debug stuff.
All I want to achieve is to get the word with the hash. Works great, EXCEPT if someone enters a Word in french which has àèé or other characters in it.
The output then just stops at the first special char.
#dfsdfaàèé asda sda sd asd aArray ( [0] => Array ( [0] => #dfsdfa ) [1] => Array ( [0] => #dfsdfa ) )
any ideas? :D

Just use this expression /(#[^\s[:punct:]]+)/.
Reads as "A # plus at least one character that is not white-space or punctuation."
The [:punct:] is one of the POSIX character classes.

Related

Extract multiples strings from one variable by using preg_match

I'm having troubles extracting several strings between tags from a variable, in order to echo them inside a span individually.
Thing is, the tags in question are determined by a variable, here is what it looks like :
<?php
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
preg_match("/(.[$lang]), (.[\/$lang])/", $string, $outputs_list);
foreach ($outputs_list as $output) {
echo "<span>".$output."/span>";
}
// in this exemple I want to output :
// <span>english1</span>
// <span>english2</span>
?>
It's my first time using preg_match and after trying so many differents things I'm kinda lost right now.
Basically I want to extract every strings contained between the tags [$lang] and [/$lang] (in my exemple $lang = "en_UK" but it will be determined by the user's cookies.
I'd like some help figuring this out if possible,
Thanks

[] in a regular expression makes a character class. I'm not sure what you're trying to do with the .s and , either. Your regex currently says:
Any single character, an e, n, _, U, or K, a , and space, and again any single character, an e, n, _, U, K, but this time also allowing /.
Regex demo: https://regex101.com/r/8pmy89/2
I also believe you were grouping the wrong value. The () go around what you want to capture, know as a capture group.
I think the regex you want is:
\[$lang\](.+?)\[\/$lang\]
Regex demo: https://regex101.com/r/8pmy89/3
PHP Usage:
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
preg_match_all("/\[$lang\](.+?)\[\/$lang\]/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output."/span>";
}
PHP demo: https://eval.in/686086
Preg_match vs. preg_match_all
Preg_match only returns the first match(es) of a regex. Preg_match_all returns all matches of the regex. The 0 index has what the full regex matched. All subsequent indexes are each capture groups, e.g. 1 is the first capture group.
Simple Demo: https://eval.in/686116
$string = '12345';
preg_match('/(\d)/', $string, $match);
print_r($match);
preg_match_all('/(\d)/', $string, $match);
print_r($match);
Output:
Array
(
[0] => 1
[1] => 1
)
^ is preg_match, below is the preg_match_all
Array
(
[0] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
)

I made little late to post this answer. #chris already answered your question, so i made the solution broad which might help other users who may have different requirements.
<?php
$string = "[en_UK]english1[/en_UK][en_UK]english2[/en_UK][fr_FR]francais1[/fr_FR][fr_FR]francais2[/fr_FR][fr_FR]francais3[/fr_FR]";
$lang = "en_UK";
echo "User Language -";
preg_match_all("/\[$lang\](.+?)\[\/$lang\]/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output.",</span>";
}
echo "<br/>All languages - ";
//if all lang
preg_match_all("/[a-zA-Z0-9]{3,}/", $string, $outputs_list);
foreach ($outputs_list[0] as $output) {
echo "<span>".$output.",</span>";
}
echo "<br/>Type of lang - ";
//for showing all type of lang
preg_match_all("/(\[[a-zA-Z_]+\])*/", $string, $outputs_list);
foreach ($outputs_list[1] as $output) {
echo "<span>".$output."</span>";
}
?>
OUTPUT
User Language -english1,english2,
All languages - english1,english2,francais1,francais2,francais3,
Type of lang - [en_UK][en_UK][fr_FR][fr_FR][fr_FR]

preg match to get text after # symbol and before next space using php

I need help to find out the strings from a text which starts with # and till the next immediate space by preg_match in php
Ex : I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
Could any body help me to find out the solutions for this.
Thanks in advance!

PHP and Python are not the same in regard to searches. If you've already used a function like strip_tags on your capture, then something like this might work better than the Python example provided in one of the other answers since we can also use look-around assertions.
<?php
$string = <<<EOT
I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
#maybe the username is at the front.
Or it could be at the end #whynot, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
echo $string."<br>";
preg_match_all('~(?<=[\s])#[^\s.,!?]+~',$string,$matches);
print_r($matches);
?>
Output results
Array
(
[0] => Array
(
[0] => #string
[1] => #maybe
[2] => #whynot
)
)
Update
If you're pulling straight from the HTML stream itself, looking at the Twitter HTML it's formatted like this however:
<s>#</s><b>UserName</b>
So to match a username from the html stream you would match with the following:
<?php
$string = <<<EOT
<s>#</s><b>Nancy</b> what are you on about?
I want to get <s>#</s><b>string</b> from this line as separate. In this example, I need to extract "#string" alone from this line.
<s>#</s><b>maybe</b> the username is at the front.
Or it could be at the end <s>#</s><b>WhyNot</b>, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
$matchpattern = '~(<s>(#)</s><b\>([^<]+)</b>)~';
preg_match_all($matchpattern,$string,$matches);
$users = array();
foreach ($matches[0] as $username){
$cleanUsername = strip_tags($username);
$users[]=$cleanUsername;
}
print_r($users);
Output
Array
(
[0] => #Nancy
[1] => #string
[2] => #maybe
[3] => #WhyNot
)

Just do simply:
preg_match('/#\S+/', $string, $matches);
The result is in $matches[0]

PHP Parsing text to extract two numbers

The text is similar to this: +1191–1405 Holy Damage The numbers can change as well as the Damage type. Like +777-1444 Fire Damage ect.
What I want to do is extract the two numbers. So from the first example I want 1191 and 1405 and I need them to be integers not strings.
I've read up on preg_ stuff and such a bit and can do simple searches and parsing but i'm not quite at this level. I'm guessing I need to extract whatever numbers that are after + but before -, and after - discarding everything else.

Use this:
preg_match('/\+(\d+)-(\d+)/', $text, $match);
Now $match[1] contains the first number, $match[2] contains the second number.

Because I hate pattern matching and avoid it whenever possible:
function getNumbersFromString($str){
$splt = explode(' ', $str); // split by spaces
$sub = substr($splt[0], 1); // get rid of leading +
return explode('-', $sub); // return split by -
}
// Array ( [0] => 777 [1] => 1444 )
print_r(getNumbersFromString('+777-1444 Fire Damage'));
// Array ( [0] => 1191 [1] => 1405 )
print_r(getNumbersFromString('+1191-1405 Holy Damage'));

RegEx for hashtag separated string

I have bunch of strings like this:
a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc
And what I need to do is to split them up based on the hashtag position to something like this:
Array
(
[0] => A
[1] => AAX1AAY222
[2] => B
[3] => BBX4BBY555BBZ6
[4] => C
[5] => MMM1
[6] => D
[7] => ARA1
[8] => E
[9] => ABC
)
So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.
I've the following RegEx which works fine only when I have a numeric value in the end of each part.
Here is the RegEx set up:
preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
And it works fine with something like this:
C#mmm1D#ara1
But, if I change it to this (removing the numbers):
C#mmmD#ara
Then it will be the result, which is not good:
Array
(
[0] => C
[1] => D
)
I've looked at this question and this one also, which are similar but none of them worked for me.
So, my question is why does it work only if it has followed by a number? and how I can solve it?
Here you can see some of them sample strings which I have:
a#123b#abcc#def456 // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8 // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789 // A:ABCDEF123, B:5, C:XYZ789
P.S. Strings are case-insensitive.
P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.
Why Not Using explode?
If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!
Update
Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.

Instead of using preg_split(), decide what you want to match instead:
A set of "words" if followed by either <any-char># or <end-of-string>.
A character if immediately followed by #.
$str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
Demo
This expression uses two look-ahead assertions. The results are in $matches[0].
Update
Another way of looking at it would be this:
preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);
print_r(array_combine($matches[1], $matches[2]));
Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.
The output is this:
Array
(
[a] => aax1aay222
[b] => bbx4bby555bbz6
[c] => mmm1
[d] => ara1
[e] => abc
)

If you still want to use preg_split you can remove the + and it might work as expected:
'/([A-Z])#/i'
Since then you only match the hashtag and ONE alpha character before, and not all them.
Example: http://codepad.viper-7.com/z1kFDb
Edit: Added a case-insensitive flag i in the pattern.

Use explode() rather than Regexp
$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];
edit: I updated my answer to reflect better reading the questions

You can use explode() function that will split the string except the hash signs, like stated in the answers given before.
$myArray = explode("#",$string);
For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like
$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);
All you need now is to take the last character of each string in array as another item.
$copy = array();
foreach($myArray as $item){
$beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
$ending = substr($item,-1); // this takes the last one
$copy[] = $beginning;
$copy[] = $ending;
} // end foreach
This is an example, not tested.
EDIT
Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.

regex assistance

I am trying to match a semi dynamically generated string. So I can see if its the correct format, then extract the information from it that I need. My Problem is I no matter how hard I try to grasp regex can't fathom it for the life of me. Even with the help of so called generators.
What I have is a couple different strings like the following. [#img:1234567890] and [#user:1234567890] and [#file:file_name-with.ext]. Strings like this pass through are intent on passing through a filter so they can be replaced with links, and or more readable names. But again try as I might I can't come up with a regex for any given one of them.
I am looking for the format: [#word:] of which I will strip the [, ], #, and word from the string so I can then turn around an query my DB accordingly for whatever it is and work with it accordingly. Just the regex bit is holding me back.

Not sure what you mean by generators. I always use online matchers to see that my test cases work. #Virendra almost had it except forgot to escape the [] charaters.
/\[#(\w+):(.*)\]/
You need to start and end with a regex delimeter, in this case the '/' character.
Then we escape the '[]' which is use by regex to match ranges of characters hence the '['.
Next we match a literal '#' symbol.
Now we want to save this next match so we can use it later so we surround it with ().
\w matches a word. Basically any characters that aren't spaces, punctuation, or line characters.
Again match a literal :.
Maybe useful to have the second part in a match group as well so (.*) will match any character any number of times, and save it for you.
Then we escape the closing ] as we did earlier.
Since it sounds like you want to use the matches later in a query we can use preg_match to save the matches to an array.
$pattern = '/\[#(\w+):(.*)\]/';
$subject = '[#user:1234567890]';
preg_match($pattern, $subject, $matches);
print_r($matches);
Would output
array(
[0] => '[#user:1234567890]', // Full match
[1] => 'user', // First match
[2] => '1234567890' // Second match
)
An especially helpful tool I've found is txt2re

Here's what I would do.
<pre>
<?php
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
preg_match_all('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',$subj,$matches,PREG_SET_ORDER);
foreach ($matches as &$arr) unset($arr[0],$arr[1],$arr[2],$arr[3]);
print_r($matches);
?>
</pre>
This will output
Array
(
[0] => Array
(
[match] => [#img:1234567890]
[type] => img
[value] => 1234567890
)
[1] => Array
(
[match] => [#user:1234567890]
[type] => user
[value] => 1234567890
)
[2] => Array
(
[match] => [#file:file_name-with.ext]
[type] => file
[value] => file_name-with.ext
)
)
And here's a pseudo version of how I would use the preg_replace_callback() function:
function replace_shortcut($matches) {
global $users;
switch (strtolower($matches['type'])) {
case 'img' : return '<img src="images/img_'.$matches['value'].'jpg" />';
case 'file' : return ''.$matches['value'].'';
// add id of each user in array
case 'user' : $users[] = (int) $matches['value']; return '%s';
default : return $matches['match'];
}
}
$users = array();
$replaceArr = array();
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
// escape percentage signs to avoid complications in the vsprintf function call later
$subj = strtr($subj,array('%'=>'%%'));
$subj = preg_replace_callback('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',replace_shortcut,$subj);
if (!empty($users)) {
// connect to DB and check users
$query = " SELECT `id`,`nick`,`date_deleted` IS NOT NULL AS 'deleted'
FROM `users` WHERE `id` IN ('".implode("','",$users)."')";
// query
// ...
// and catch results
while ($row = $con->fetch_array()) {
// position of this id in users array:
$idx = array_search($row['id'],$users);
$nick = htmlspecialchars($row['nick']);
$replaceArr[$idx] = $row['deleted'] ?
"<span class=\"user_deleted\">{$nick}</span>" :
"{$nick}";
// delete this key so that we can check id's not found later...
unset($users[$idx]);
}
// in here:
foreach ($users as $key => $value) {
$replaceArr[$key] = '<span class="user_unknown">User'.$value.'</span>';
}
// replace each user reference marked with %s in $subj
$subj = vsprintf($subj,$replaceArr);
} else {
// remove extra percentage signs we added for vsprintf function
$subj = preg_replace('~%{2}~','%',$subj);
}
unset($query,$row,$nick,$idx,$key,$value,$users,$replaceArr);
echo $subj;

You can try something like this:
/\[#(\w+):([^]]*)\]/
\[ escapes the [ character (otherwise interpreted as a character set); \w means any "word" character, and [^]]* means any non-] character (to avoid matching past the end of the tag, as .* might). The parens group the various matched parts so that you can use $1 and $2 in preg_replace to generate the replacement text:
echo preg_replace('/\[#(\w+):([^]]*)\]/', '$1 $2', '[#link:abcdef]');
prints link abcdef

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP regex hashtag and special chars - php

Just use this expression /(#[^\s[:punct:]]+)/. Reads as "A # plus at least one character that is not white-space or punctuation." The [:punct:] is one of the POSIX character classes.

Related

Extract multiples strings from one variable by using preg_match

preg match to get text after # symbol and before next space using php

PHP Parsing text to extract two numbers

RegEx for hashtag separated string

regex assistance

Categories

Resources