Extracting and removing URL from a block of text

Extracting and removing URL from a block of text - php

I have this block of text:
$text = 'This just happened outside the store http://somedomain.com/2012/12/store there might be more text afterwards...';
It needs to be converted to:
$result['text_1'] = 'This just happened outside the store';
$result['text_2'] = 'there might be more text afterwards...';
$result['url'] = 'http://somedomain.com/2012/12/store';
This is my current code, it does detect the url, but i can only remove it from the text, I still need the url value separately in an array:
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '', $text);
//returns "This just happened outside the store there might be more text afterwards..."
Any ideas? Thanks!
Temporal solution (can this be optimized?)
$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
preg_match('/https?:\/\/[^\s"<>]+/',$text,$url);
$string = preg_split('/https?:\/\/[^\s"<>]+/', $text);
$text = preg_replace('/\s\s+/','. ',implode(' ',$string));
echo ''.$text.'';

Do you need it to store in a variable or just need it inside the ahref?
How about this?
<?php
$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
$pattern = '#(.*?)(https?://.*?) (.*)#';
$ret = preg_replace( $pattern, '$3', $text );
var_dump( $ret );
$1, $2, and $3 corresponds to the 1st, 2nd, 3rd parenthesis
the output would be
There might be more text afterwards...

you could split your string on the regex using preg_split to give you an array
$result = preg_split('/(https?:\/\/[^\s"<>]+)/', $the_string, -1, PREG_SPLIT_DELIM_CAPTURE);
// $result[0] = preamble
// $result[1] = url
// $result[2] = possible afters

Related

PHP: regex to replace a#3 in string

I want to make links using shortcuts following the pattern: controller/#/id. For example: a#3 must be rewritten to /actions/view/3, and t#28 must be a link to /tasks/view/28. I think preg_replace is an "easy" way to achieve this, but I'm not that good with regular expressions and I don't know how to "reuse" the digits from the search-string within the result. I think I need something like this:
$search = array('/a#\d/', '/t#\d/');
$replace = array('/actions/view/$1', '/tasks/view/$1');
$text = preg_replace($search, $replace, $text);
Can someone point me in the right direction?

You can "reuse" the numbers from the search strings using capturing groups, denoted by brackets ().
Try this -
$text = "a#2 a#3 a#5 a#2 t#34 t#34 t#33 t#36";
$search = array('/\ba#(\d+)\b/', '/\bt#(\d+)\b/');
$replace = array('/actions/view/$1', '/tasks/view/$1');
$text = preg_replace($search, $replace, $text);
var_dump($text);
/**
OUTPUT-
string '/actions/view/2 /actions/view/3 /actions/view/5 /actions/view/2 /tasks/view/34 /tasks/view/34 /tasks/view/33 /tasks/view/36' (length=123)
**/
The above answer works, but if you need to add more of those search values, you can store those keys in separate array and you can use preg_replace_callback.This also does the same thing, but now, you only need to add more (alphabets)keys in the array and it will replace it accordingly. Try something like this-
$arr = Array(
"a"=> "/actions/view/",
"t"=> "/tasks/view/"
);
$text = preg_replace_callback("/\b([a-z]+)#(\d+)\b/", function($matches) use($arr){
var_dump($matches);
return $arr[$matches[1]].$matches[2];
},$text);
var_dump($text);
/**
OUTPUT-
string '/actions/view/2 /actions/view/3 /actions/view/5 /actions/view/2 /tasks/view/34 /tasks/view/34 /tasks/view/33 /tasks/view/36' (length=123)
**/

Since the number is not replaced you can use strtr (if it is not too ambigous) :
$trans = array('a#' => '/actions/view/', 't#' => '/tasks/view/');
$text = strtr($text, $trans);
if you can use this, it will be faster than processing a string two times with a regex.

php preg_match_all preg_replace array issue

I'm working on a bb-code replacement function when a user wants to post a smiley.
The problem is, that if someone uses a bb-code smiley that doesn't exists, it results in an empty post because the browser will not display the (non-existing) emoticon.
Here's my code so far:
// DO [:smiley:]
$convert_smiley = preg_match_all('/\[:(.*?):\]/i', $string, $matches);
if( $convert_smiley )
{
$string = preg_replace('/\[:(.*?):\]/i', "<i class='icon-smiley-$1'></i>", $string, $convert_smiley);
}
return $string;
The bb-code for a smiley usually looks like [:smile:] or like [:sad:] or like [:happy:] and so on.
The code above is working well, until someone post a bb-code that doesn't exists, so what I am asking for is a fix for non existing smileys.
Is there a possibility, in example to create an array, like array('smile', 'sad', 'happy') and only bb-code that matches one or more in this array will be converted?
So, after the fix, posting [:test:] or just [::] should not be converted and should be posted as original text while [:happy:] will be converted.
Any ideas? Thanks!

I put your possible smiley’s in non-grouping parentheses with or symbol in a regexp:
<?php
$string = 'looks like [:smile:] or like [:sad:] or like [:happy:] [:bad-smiley:]';
$string = preg_replace('/\[:((?:smile)|(?:sad)|(?:happy)):\]/i', "<i class='icon-smiley-$1'></i>", $string);
print $string;
Output:
looks like <i class='icon-smiley-smile'></i> or like <i class='icon-smiley-sad'></i> or like <i class='icon-smiley-happy'></i> [:bad-smiley:]
[:bad-smiley:] is ignored.

A simple workaround:
$string ="[:clap:]";
$convert_smiley = preg_match_all('/\[:(.*?):\]/i', $string, $matches);
$emoticons = array("smile","clap","sad"); //array of supported smileys
if(in_array($matches[1][0],$emoticons)){
//smily exists
$string = preg_replace('/\[:(.*?):\]/i', "<i class='icon-smiley-$1'></i>", $string, $convert_smiley);
}
else{
//smily doesn't exist
}

Well, the first issue is you are setting $convert_smiley to the true/false value of the preg_match_all() instead of parsing the results. Here is how I reworked your code:
// Test strings.
$string = ' [:happy:] [:frown:] [:smile:] [:foobar:]';
// Set a list of valid smileys.
$valid_smileys = array('smile', 'sad', 'happy');
// Do a `preg_match_all` against the smiley’s
preg_match_all('/\[:(.*?):\]/i', $string, $matches);
// Check if there are matches.
if (count($matches) > 0) {
// Loop through the results
foreach ($matches[1] as $smiley_value) {
// Validate them against the valid smiley list.
$pattern = $replacement = '';
if (in_array($smiley_value, $valid_smileys)) {
$pattern = sprintf('/\[:%s:\]/i', $smiley_value);
$replacement = sprintf("<i class='icon-smiley-%s'></i>", $smiley_value);
$string = preg_replace($pattern, $replacement, $string);
}
}
}
echo 'Test Output:';
echo htmlentities($string);
Just note that I chose to use sprintf() for the formatting of content & set $pattern and $replacement as variables. I also chose to use htmlentities() so the HTML DOM elements can easily be read for debugging.

Remove text inside of text from a larger string with PHP

What I'm trying to do is, if it exists, remove an occurrence of text inside a 'shortcode', eg: Here's some content [shortcode]I want this text removed[/shortcode] Some more content to be changed to Here's some content [shortcode][/shortcode] Some more content.
It seems like a pretty simple thing to do but I can't figure it out.. =/
The shortcode will only show up once in the entire string.
Thanks in advance for help.

Try this:
$var = "Here's some content [shortcode]I want this text removed[/shortcode] Some more content";
$startTag = "[shortcode]";
$endTag = "[/shortcode]";
$pos1 = strpos($var, $startTag) + strlen($startTag);
$pos2 = strpos($var, $endTag);
$result = substr_replace($var, '', $pos1, $pos2-$pos1);

It's very easy to do with preg_replace(). For your purpose, use /\[shortcode\].*\[\/shortcode\]/ as pattern.
$replace = "[shortcode][/shortcode]";
$filteredText = preg_replace("/\[shortcode\].*\[\/shortcode\]/", $replace, $yourContent);
See http://php.net/manual/en/function.preg-replace.php for more details.

One can use strpos() to find the position of [substring] and [/substring] in your string and replace the text with a whitespace via substr_replace()

if you do not want to bother with regular expessions:
if you do have the [shortcode] tag inside the string, than it is really no problem: just use a nested use of substr:
substr($string,0,strpos($string,'[substring]')+11)+substr($string,strpos($string,'[/substring]'),strlen($string))
where the first substr cuts the string to the start of the string to cut and the second adds the remaining stuff of the string.
see here:
http://www.php.net/manual/en/function.substr.php
http://www.php.net/manual/en/function.strpos.php

use regex in php to get rid of it.
preg_replace (shortcode, urText, '', 1)

$string = "[shortcode]I want this text removed[/shortcode]";
$regex = "#\[shortcode\].*\[\/shortcode\]#i";
$replace = "[shortcode][/shortcode]";
$newString = preg_replace ($regex, $replace, $string, -1 );

$content = "Here's some content [shortcode]I want this text removed[/shortcode] Some more content to be changed to Here's some content [shortcode][/shortcode] Some more content";
print preg_replace('#(\[shortcode\])(.*?)(\[/shortcode\])#', "$1$3", $content);
Yields:
Here's some content [shortcode][/shortcode] Some more content to be changed to Here's some content [shortcode][/shortcode] Some more content

Replace names in text with links

I want to replace names in a text with a link to there profile.
$text = "text with names in it (John) and Jacob.";
$namesArray("John", "John Plummer", "Jacob", etc...);
$LinksArray("<a href='/john_plom'>%s</a>", "<a href='/john_plom'>%s</a>", "<a href='/jacob_d'>%s</a>", etc..);
//%s shout stay the the same as the input of the $text.
But if necessary a can change de array.
I now use 2 arrays in use str_replace. like this $text = str_replace($namesArray, $linksArray, $text);
but the replace shout work for name with a "dot" or ")" or any thing like that on the end or beginning. How can i get the replace to work on text like this.
The output shout be "text with names in it (<a.....>John</a>) and <a ....>Jacob</a>."

Here is an example for a single name, you would need to repeat this for every element in your array:
$name = "Jacob";
$url = "<a href='/jacob/'>$1</a>";
$text = preg_replace("/\b(".preg_quote($name, "/").")\b/", $url, $text);

Try something like
$name = 'John';
$new_string = preg_replace('/[^ \t]?'.$name.'[^ \t]/', $link, $old_string);
PHP's preg_replace accepts mixed pattern and subject, in other words, you can provide an array of patterns like this and an array of replacements.

Done, and no regex:
$text = "text with names in it (John) and Jacob.";
$name_link = array("John" => "<a href='/john_plom'>",
"Jacob" => "<a href='/jacob'>");
foreach ($name_link as $name => $link) {
$tmp = explode($name, $text);
if (count($tmp) > 1) {
$newtext = array($tmp[0], $link, $name, "</a>",$tmp[1]);
$text = implode($newtext);
}
}
echo $text;
The links will never change for each given input, so I'm not sure whether I understood your question. But I have tested this and it works for the given string. To extend it just add more entries to the $name_link array.

Look for regular expressions. Something like preg_replace().
preg_replace('/\((' . implode('|', $names) . ')\)/', 'link_to_$1', $text);
Note that this solution takes the array of names, not just one name.

Can I use regex for this?

Is this possible with regex?
I have a file, and if a '#' is found in the file, the text after the '#' with the '#' is to be replaced with the file with the same name as after the '#'.
File1: "this text is found in file1"
File2: "this file will contain text from file1: #file1".
File2 after regex: "this file will contain text from file1: this text is found in file1".
I wish to do this with php and I've heard that the preg function is better than the ereg, but whatever works is fine with me =)
Thanks a lot!
EDIT:
It has to be programmed, so that it looks through file2 without knowing which files to concatenate before it has gone through all occurrences of a # :)

PHP's native functions str_pos and str_replace are better to use when you're searching through larger files or strings. ;)

First of all the grammar of your templating is not a very good one becuase the parser may not exactly sure when will the file name ends.
My suggestion would be that you change to the one that can better detect the boundry like {#:filename}.
Anyhow, the code I give below follows your question.
<?php
// RegEx Utility functions -------------------------------------------------------------------------
function ReplaceAll($RegEx, $Processor, $Text) {
// Make sure the processor can be called
if(!is_callable($Processor))
throw new Exception("\"$Processor\" is not a callable.");
// Do the Match
preg_match_all($RegEx, $Text, $Matches, PREG_OFFSET_CAPTURE + PREG_SET_ORDER);
// Do the replacment
$NewText = "";
$MatchCount = count($Matches);
$PrevOffset = 0;
for($i = 0; $i < $MatchCount; $i++) {
// Get each match and the full match information
$EachMatch = $Matches[$i];
$FullMatch = is_array($EachMatch) ? $EachMatch[0] : $EachMatch;
// Full match is each match if no grouping is used in the regex
// Full match is the first element of each match if grouping is used in the regex.
$MatchOffset = $FullMatch[1];
$MatchText = $FullMatch[0];
$MatchTextLength = strlen($MatchText);
$NextOffset = $MatchOffset + $MatchTextLength;
// Append the non-match and the replace of the match
$NewText .= substr($Text, $PrevOffset, $MatchOffset - $PrevOffset);
$NewText .= $Processor($EachMatch);
// The next prev-offset
$PrevOffset = $NextOffset;
}
// Append the rest of the text
$NewText .= substr($Text, $PrevOffset);
return $NewText;
}
function GetGroupMatchText($Match, $Index) {
if(!is_array($Match))
return $Match[0];
$Match = $Match[$Index];
return $Match[0];
}
// Replacing by file content -----------------------------------------------------------------------
$RegEx_FileNameInText = "/#([a-zA-Z0-9]+)/"; // Group #1 is the file name
$ReplaceFunction_ByFileName = "ReplaceByFileContent";
function ReplaceByFileContent($Match) {
$FileName = GetGroupMatchText($Match, 1); // Group # is the gile name
// $FileContent = get_file_content($FileName); // Get the content of the file
$FileContent = "{# content of: $FileName}"; // Dummy content for testing
return $FileContent; // Returns the replacement
}
// Main --------------------------------------------------------------------------------------------
$Text = " === #file1 ~ #file2 === ";
echo ReplaceAll($RegEx_FileNameInText, $ReplaceFunction_ByFileName, $Text);
This will returns === {# content of: file1} ~ {# content of: file2} ===.
The program will replace all the regex match with the replacement returned from the result of the given function name.
In this case, the callback function is ReplaceByFileContent in which the file name is extract from the group #1 in the regex.
I believe my code is self documented but if you have any question, you can ask me.
Hope I helps.

Much cleaner:
<?php
$content = file_get_content('content.txt');
$m = array();
preg_match_all('`#([^\s]*)(\s|\Z)`ism', $content, $m, PREG_SET_ORDER);
foreach($m as $match){
$innerContent = file_get_contents($match[1]);
$content = str_replace('#'.$match[1], $innerContent, $content);
}
// done!
?>
regex tested with: http://www.spaweditor.com/scripts/regex/index.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting and removing URL from a block of text - php

you could split your string on the regex using preg_split to give you an array $result = preg_split('/(https?:\/\/[^\s"<>]+)/', $the_string, -1, PREG_SPLIT_DELIM_CAPTURE); // $result[0] = preamble // $result[1] = url // $result[2] = possible afters

Related

PHP: regex to replace a#3 in string

php preg_match_all preg_replace array issue

Remove text inside of text from a larger string with PHP

Replace names in text with links

Can I use regex for this?

Categories

Resources