Trying to use preg_replace to find words with # in them and replace the whole word with nothing.
<?php
$text = "This is a text #removethis little more text";
$textreplaced = preg_replace('/#*. /', '', $text);
echo $captions;
Should output: This is a text little more text
Been trying to google on special charc and such but am lost.
Use \w:
$textreplaced = preg_replace('/#[\w]+ /', '', $text);
echo $textreplaced;
Only finding one at character at a time
I believe you are only finding the '#' to begin with, but if you find the whole string inside use \b around the regex so your final regex should be something like /(#).{2,}?\b/.
The ? mark is important because regexes are greedy and grab as many letters as posible
Just a tip visit a tester like regexpal
Related
I receive a plain description text for some Instagram posts, and try to highlight a hashtags. I use this code:
$caption = preg_replace('/(?<!\S)#([0-9a-zA-Z_.]+)/', '#$1', $caption);
The problem is: this doesn't work with non-latin characters, like "ş" or "ö". And the second problem: this doesn't work with hashtags without space between, for example, "#quote#quoteoftheday #myquote" - my regular expression will highlight only "#quote" and "#myquote". May I somehow solve these problems in just single regular expression?
You could remove the (?<!\S) part so it can also match when there is a non whitespace char in from of # and add the unicode flag /u.
You can shorten 0-9a-zA-Z_ to \w so your expression might look like:
#([\w.]+)
Regex demo
$caption = "#quote#öquoteoftheday #şmyquote";
$caption = preg_replace('/#([\w.]+)/u', '#$1', $caption);
echo $caption;
Result:
#quote#öquoteoftheday #şmyquote
I have string like
<li class="video_description"><strong>Description:</strong> hello world, This is test description.</li>
And i want string like, "hello world, This is test description." That string willbe dynamic everytime.
So, how i can use preg_match option here?
It is not a good idea to use regex to parse html in PHP.
I would suggest to use simple_html_dom as it is simple and suits your situation
With all the disclaimers about using regex to parse html, if you want a regex, you can use this:
>\s*\K(?:(?!<).)+(?=</li)
See the match in the Regex Demo.
Sample PHP Code
$regex = '~>\s*\K(?:(?!<).)+(?=</li)~';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
Explanation
>\s* matches a closing > and optional spaces
The \K tells the engine to drop what was matched so far from the final match it returns
(?:(?!<).)+ matches any chars that do not start a tag
The lookahead (?=</li) asserts that what follows is </li
Another solution
<li.*<\/strong>\s?(.*)<\/li>
Usage:
$string = '<li class="video_description"><strong>Description:</strong> hello world, This is test description.</li>';
$pattern = '/<li.*<\/strong>\s?(.*)<\/li>/';
if(preg_match($pattern, $string)){
echo "Macth was found";
}
I have some text like this :
$text = "Some thing is there http://example.com/جميع-وظائف-فى-السليمانية
http://www.example.com/جميع-وظائف-فى-السليمانية nothing is there
Check me http://example.com/test/for_me first
testing http://www.example.com/test/for_me the url
Should be test http://www.example.com/翻译-英语教师-中文教师-外贸跟单
simple text";
I need to preg_match the URL, but they are of different languages.
So, I need to get the URL itself, from each line.
I was doing like this :
$text = preg_replace("/[\n]/", " <br>", $text);
$lines = explode("<br>", $text);
foreach($line as $textLine){
if (preg_match("/(http\:\/\/(.*))/", $textLine, $match )) {
// some code
// Here I need the url
}
}
My current regex is /(http\:\/\/(.*))/, please suggest how I can make this compatible with the URLs in different languages?
A regular expression like this may work for you?
In my test it worked with the text example you gave however it is not very advanced. It will simple select all characters after http:// or https:// until a white-space character occures (space, new line, tab, etc).
/(https?\:\/\/(?:[^\s]+))/gi
Here is a visual example of what would be matched from your sample string:
http://regex101.com/r/bR0yE9
You don't need to work line by line, you can search directly:
if (preg_match_all('~\bhttp://\S+~', $text, $matches))
print_r($matches);
Where \S means "all that is not a white character".There is no special internalisation problem.
Note: if you want to replace all newlines after with <br/>, I suggest to use $text = preg_replace('~\R~', '<br/>', $text);, because \R handles several type of newlines when \n will match only unix newlines.
I am trying to convert user's posts (text) into hashtag clickable links, using PHP.
From what I found, hashtags should only contain alpha-numeric characters.
$text = 'Testing#one #two #three.test';
$text = preg_replace('/#([0-9a-zA-Z]+)/i', '#$1', $text);
It places links on all (#one #two #three), but I think the #one should not be converted, because it is next to another alpha-numeric character, how to adjust the reg-ex to fix that ?
The 3rd one is also OK, it matches just #three, which I think is correct.
You could modify your regex to include a negative lookbehind for a non-whitespace character, like so:
(?<!\S)#([0-9a-zA-Z]+)
Working regex example:
http://regex101.com/r/mR4jZ7
PHP:
$text = preg_replace('/(?<!\S)#([0-9a-zA-Z]+)/', '#$1', $text);
Edit:
And to make the expression compatible with other languages (non-english characters):
(?<!\S)#([0-9\p{L}]+)
Working example:
https://regex101.com/r/Pquem3/1
With uni-code, html encoded safe and joined regexp; ~(?<!&)#([\pL\d]+)~u
Here some's tags like #tag1 #tag2#tag3 etc.
Finally I have found the solution like: facebook or others hashtag to url solutions, it may be help you too. This code also works with unicode. I have used some of Bangla Unicode, let me know other Languages work as well, I think it will work on any language.
$str = '#Your Text #Unicode #ফ্রিকেলস বা #তিল মেলানিনের #অতিরিক্ত উৎপাদনের জন্য হয় যা #সূর্যালোকে #বাড়ে';
$regex = '/(?<!\S)#([0-9a-zA-Z\p{L}\p{M}]+)/mu';
$text = preg_replace($regex, '#$1', $str);
echo $text;
To catch the second and third hashtags without the first one, you need to specify that the hashtag should start at the beginning of the line, or be preceded one of more characters of whitespace as follows:
$text = 'Testing#one #two #three.test';
$text = preg_replace('/(^|\s+)#([0-9a-zA-Z]+)(\b|$)/', '$1#$2', $text);
The \b in the third group defines a word boundary, which allows the pattern to match #three when it is immediately followed by a non-word character.
Edit: MElliott's answer above is more efficient, for the record.
I'm attempting to remove noise words from a string, and I have what I believe is a good algorithm for it, but I'm running into a snag. Before I do my preg_replace I remove all punctuation except apostrophe ('). The I put it through this preg_replace:
$content = preg_replace('/\b('.implode('|', self::$noiseWords).')\b/','',$content);
Which works great, except for words that do indeed have that ' character. preg_replace seems to be treating that as a boundary character. This is a problem for me.
Is there a way I can get around this? A different solution perhaps?
Thanks!
Here is the example I'm using:
$content = strtolower(strip_tags($content));
$content = preg_replace("/(?!['])\p{P}/u", "", $content);// remove punctuation
echo $content;// i've added striptags for editing as well should still workyep it doesnbsp
$content = preg_replace("/\b(?<')(".implode('|', self::$noiseWords).")(?!')\b/",'',$content);
$contentArray = explode(" ", $content);
print_r($contentArray);
On the 3rd line you'll see the comment of what $content is right before the preg_replace
And though I'm assuming you can guess what my noiseWords array looks like, here's just a small fraction of it:
$noiseWords = array("a", "able","about","above","abroad","according","accordingly","across",
"actually","adj","after","afterwards","again",......)
You can use a negative lookbehind and positive lookahead to make sure you're not "around" a quote character:
$regex = "/\b(?<!')(".implode('|', self::$noiseWords).")(?!')\b/";
Now, your regex will not match anything that is preceded by or following with a single quote.