This question already has answers here:
How do I replace certain parts of my string?
(5 answers)
Closed 2 years ago.
I'm creating a simple comment system connected by Steam API. Every Steam user connected in my website can automatically post things. But i'm changing some functions to replace things like the URLs.
My question is: When a user post something like,
"Hello I'm nice, have a look at http://www.cute.com"
Automatically replaces the http:// for the link without changing the http:// in the string.
Maybe something like this?
<?php
$str = "helloo im nice, have a look http://www.cute.com";
echo preg_replace("/http:\/\/(.+)\.(.+)\.(.+)/", "<a href='http://$1.$2.$3'>$1.$2.$3</a>", $str);
?>
This will convert any link into an anchor (or an a tag).
Alternative added
Alternatively, it might be a good idea to add support for https as well. In which case the following might be useful.
<?php
$str = "helloo im nice, have a look http://www.cute.com";
echo preg_replace("/http(s?):\/\/(.+)\.(.+)\.(.+)/", "<a href='http$1://$2.$3.$4'>http$1://$2.$3.$4</a>", $str);
?>
This takes advantage of the ? modifier which means "one or more of the preceding character". In this case it is the "s" character since it is "http" and "https" both match.
Explanation
This uses RegEx (or Regular Expressions) to create this.
The first parameter of the preg_replace function takes the RegEx (I like to test mine here: http://regexr.com/).
All RegExs must start and end with a forward slash. The bits inbetween are as follows.
http: is simply selecting a string that starts with "http:"
\/\/ is called "escaping" and that will select two forward slashes. Since forward slashes are special characters used in RegEx (start and end of a statement) they need to be escaped so that PHP doesn't think the RegEx has ended sooner.
(.+) The brackets are also special characters (though not escaped) and they are known as "capture groups". What this is used for is so that I can see what is between the "http://" and the ".com" (or whatever extension is used). The full stop (or period or ".") character selects anything.
\. Further on the escaping. Since full stop is used as a special character, we have to escape this one. What that means so far is that we are selecting "http://" then anything and then stopping at a full stop.
(.+) Last but not least is the final capture group. This, again selects anything from the string so that have our final capture group and RegEx complete.
Modifiers:
? means "one or more of the preceding character". This means that /tests?/ would match test and tests since s is the preceding character and in the first example we have 0 and in the second there is 1
+ means "one of more of the preceding character". In this case we are saying one of more of anything which means we expect at least one character to be provided.
The second parameter is our replace part.
In short, the $1 and $2 sections are to reference the two brackets from the above RegEx.
Some further reading
The PHP function I used
More information on Regular Expressions
RegEx capture groups
$string = 'helloo im nice, have a look http://www.cute.com';
$string = str_replace('http://', '', $string);
echo $string;
Related
This question already has answers here:
preg_match() Unknown modifier '[' help
(2 answers)
Closed 8 years ago.
I have a script that downloads the latest newsletter from a group inbox on a spare touchscreen in our office. It works fine, but people keep accidentally unsubscribing us so I want to hide the unsubscribe link from the email.
$preg_replace seems like it would work because I can set up a pattern that simply removes any link withthe word "unsubscribe" in. I validated the pattern below using the tool at http://regex101.com/ , and it even picks up variations like "manage subscription" as well. It is ok if the odd legitimate link with the word subscribe also get removed - there won't be many and it's only for internal use.
However, when I execute I get an error.
Here's my code:
line 53: $pat='<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>';
line 54: $themail[bodycontent]= preg_replace($pat, ' ',$themail[bodycontent]);
and I get this error:
preg_replace() [function.preg-replace]: Unknown modifier ']' in /home/trev/public_html/bigscreen/screen-functions.php on line 54
It must be something really simple like an unescaped char but I have gone code blind and can't for the life of me see it.
How do I get this pattern:
<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>
to run in a simple php script?
Thanks
You haven't used any delimiters so it's treating the < character as the delimiter
Try something like this instead
$pat='#<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>#';
You have no delimiter. Or rather you do, but it's not the one you meant. PCRE is interpreting your first < as the opening delimiter (you can use matching brackets as delimiters - in fact, I use parentheses to help remind myself that the entire match is index 0). Then it sees the first > as the ending delimiter. Anything after that should be a modifier, but of course ] is not a modifier.
Wrap your regex with (...) to give it a proper set of delimiters.
$themail[bodycontent] should be either $themail['bodycontent'] or $themail[$bodycontent].
It's trying to parse bodycontent] ... as the array index.
Patterns used in preg_match need to be enclosed by a pair of delimiter characters.
For example, a / or a ~ at the start and end of the string.
Anything outside of these delimiters at the end of the string is considered to be a regex "modifier".
Your example doesn't have delimiters, so PHP is wrongly assuming that the < character is the delimiter. It therefore sees the next < character as the closing delimiter, and therefore, anything after that as a modifier. Obviously all that stuff is supposed to be inside the pattern and isn't valid as modifiers, which is why PHP is complaining.
Solution: Add a pair of modifier characters:
$pat='~<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>~';
^ ^
add this ...and this
(it doesn't have to be ~, you can choose your own modifier character to suit your needs. Best one to use is one that doesn't occur in your string (although you can escape it if it does)
Starting and ending of pattern with slash /
$pat='/<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>/';
This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 9 years ago.
I'm reading a page into a variable and I would like to disable all links that do not contain the word "remedy" in the address. The code I have so far grabs all the links including ones with "remedy". What am I doing wrong?
$page = preg_replace('~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i', '<font color="#808080">$1</font>', $page);
-- solution --
$page = preg_replace('~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i', '<font color="#808080">$2</font>', $page);
Try ~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i
To the question, what you are doing wrong: Regexes match ever if anyhow possible and for each url (even that containing remedy) it is possible to match '~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i' because you did not specify remedy may not be contained anywhere in the attribute but you specified there must be anything/nothing (.*?) that is not followed by remedy and that is the case for any url except those that begin with exactly <a href="remedy". Hope one can understand that...
I would probably use this:
<a href="(?:(?!remedy)[^"])*"[^>]*>([^<]*)</a>
The most interesting part is this:
"(?:(?!remedy)[^"])*"
Each time the [^"] is about to consume another character, it yields to the lookahead so it confirm that it's not the first character of the word remedy. Using [^"] instead of . prevents it from looking at anything beyond the closing quote. I also took the liberty of replacing your .*?s with negated character classes. This serves the same purpose, keeping the match "corralled" in the area where you want it to match. It's also more efficient and more robust.
Of course, I'm assuming the <a> element's content is plain text, with no more elements nested inside it. In fact, that's just one of many simplifying assumptions I've made. You can't match HTML with regexes without them.
I have a web script that creates a HTML page into a PHP string, then delivers it to the user. All of the pages are generated by index.php, with a unique url.
domain.host.com/index.php?loadpage=/BLAH
The homepage is static HTML, but every other page is dynamically generated into this PHP string. It may seem like im rambling, just trying to give as much info as possible. I have created a javascript code to modify the link url:
BLAH Link
This basically shows the nice neat link in the status bar, but the javascript sends it to the URL i want (I have no need to modify the url bar, as this is in an iframe)
These links are fine on the static page. But on the dynamically generated page thats in the PHP string is a little harder. I need to search through a string for every occurence of:
href="?loadpage=/ [WILDCARD] " title=
and replace it with:
href="http://domain.com/ [WILDCARD] " onclick="location.href='?loadpage=/ [WILDCARD] '; return false;" title=
This seems very complicated to me and I think it could be ereg / preg match / replace, but have no clue with regex.
In a short summary, I need some way of searching through a php string that contains the full page html, and replacing the first string with the second (on every occurance of a link with '?loadpage/'. But each link will have a different [WILDCARD] so i'm presuming, that the script will need to find every occurance, save the [WILDCARD] to a variable, then do the replace function, and insert the word its just saved as a variable from the first url.
EDIT.
Just to clarify what the original link looks like:
<a id="random" href="?loadpage=/BLAH" title="BLAH Title"></a>
this is why i am only searching from the href attribute.
You are right, what you need is a regex. (Your need for a wildcard replace is the clue). This answer is not supposed to be a complete solution, just give you an idea how regexes work. I will leave it to you to integrate this with php (try preg_match_all)
This is the pattern you want to match:
"\?loadpage=\/([^"]*)"
The \ is an escape for characters that have special meaing in regexes
So ignoring the escapes this is
"?loadpage=/ //the start of the string up to the wildcard part
() // capturing parentheses, indicating a part that
// you want to access in the replace string
[^"]* // any number of occurences of any character that is NOT doublequote
// ^ is the negation symbol
// * indicates "zero or more occurrences"
followed by...
" doublequote character
Now you need a replacement string ... for this you just need to know that your (capture parentheses) allow you to recall that part of the match. In most regex flavours your can capture these to a series numbered variables, usually represented as $1, $2, $3.. \1 \2 \3... In your case you only have one capture variable to deal with.
So you replacement string could look like
"http://domain.com/$1/" onclick="location.href='?loadpage=/$1'; return false"
In perl you would put the whole thing together like this:
$string =~ s|"\?loadpage=\/([^"]*)"|"http://domain.com/$1/" onclick=\"location.href='?loadpage=/$1'\; return false"|g;
Note that you don't need to escape your quotemarks. This may differ in php.
As you will see it easily gets very cryptic. regular-expressions.info is a useful online reference.
just so you know what you are looking at (you won't need to do this in php)...
=~ is the perl regex operator (you won't use this in php, take a look at the preg_match documentation)
then you have the form
s|match_pattern|replace_pattern|g;
where s indicates replacement (as opposed to simple matching)
g indicates global matching (otherwise process will stop on first match)
||| are the separators. Usually written /// but then you would have to escape all of your URL //s, which doubles the illegibility.
But this is now too much perl-specifc detail, read the php regex docs!
I've been searching for hours trying to find a solution to this. I am trying to determing if the REQUEST URI is legit and break it down from there.
$samplerequesturi = "/variable/12345678910";
To determine if it is legit, the first section variable is only letters and is variable in length. The second section is numbers, which should have 11 total. My problem is escaping the forward slash so it is matched in the uri. I've tried:
preg_match("/^[\/]{1}[a-z][\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^[\\/]{1}[a-z][\\/]{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^#/#{1}[a-z]#/#{1}[0-9]{11}+$/", $samplerequesturi)
preg_match("/^|/|{1}[a-z]|/|{1}[0-9]{11}+$/", $samplerequesturi)
Among others which I can't remember now.
The request usually errors out:
preg_match(): Unknown modifier '|'
preg_match(): Unknown modifier '#'
preg_match(): Unknown modifier '['
Edit:
I guess I should state that the REQUEST URI is already known. I'm trying to prove the whole string to make sure it isn't a bogus string ie to make sure there the 1st set is only lower case letters, and the 2nd set is only 11 numbers.
/ is not the only thing you can use as a delimiter. In fact, you can use almost any non-slphanumeric character. Personally I like to use () because it reminds me that the first item of the result array is the entire match and it also never needs escaping in the pattern.
preg_match("(^/([a-z]+)/(\d+)$)i",$samplerequesturi,$out);
var_dump($out);
That should do it.
If you want to use regex (which I don't think is necessary in this case, simply splitting on "/" should be fine:
$samplerequesturi = "/variable/12345678910";
preg_match("#^/([A-Za-z]+)/(\d+)$#", $samplerequesturi, $out);
echo $out[1];
echo $out[2];
should get you going
Your problem may be that you are using the / forward-slash as a regex delimiter (at the start and end of the regex expression). Switch to using a character other than the forward-slash, such as a # hash symbol or any other symbol which will never need to appear in this particular expression. Then you won't need to escape the forward-slash character at all in the expression.
I have been working on this problem for several days and it's starting to drive me crazy. I'm comfortable using regular expressions but this one thing seems to be escaping me.
I need to match a string between a set of characters if they exist otherwise it should match to the end of the line.
For example:
I'm just trying to get "content" out of the following example:
$str1="title:content #description"
$str2="title:content"
preg_match("/:(.*?)[(#)|(:)|(\*)]?$/",$str1,$content);
preg_match("/:(.*?)[(#)|(:)|(\*)]?$/",$str2,$content);
$str1 outputs:"content #description"
$str2 outputs:"content"
note: the strings may be in a different order or may not have a special character (#,:,or *) in it or they might have one so there's no "end of string" character that will be common besides "end of line".
I've tried every combination i can think of to make the entire "or" statement conditional and read a ton of posts with similar but not quite the same question.
You can write:
preg_match("/:(.*?)(?:[#:*]|$)/", $str1, $content);
(note that the match ends at one of #:* or end-of-string, using |; your version has the [#:*] optional, but makes the end-of-string mandatory.)
or simply:
preg_match("/:([^#:*]*)/", $str1, $content);
(meaning "a colon, followed by zero or more characters that aren't in #:*").