How to prevent of re-replacing by second regex? - php

I have two regex(s) on the way of my input, these:
// replace a URL with a link which is like this pattern: [LinkName](LinkAddress)
$str= preg_replace("/\[([^][]*)]\(([^()]*)\)/", "<a href='$2' target='_blank'>$1</a>", $str);
// replace a regular URL with a link
$str = preg_replace("/(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|])/i","untitled", $str);
Now there is a problem (somehow a collision). For regular URLs everything is fine. But for a pattern-based URLs, there is a problem: The first regex create a link of that and second regex again create a link of its href-attribute value.
How can I fix it?
Edit: According to the comments, how can I create a single regex instead of those two regex? (using preg_replace_callback). Honestly I tried it but it doesn't work for none kind of URLs ..
Is combining them possible? Because the output of those isn't identical. The first one has a LinkName and the second one has a constant string untitled as its LinkName.

$str = preg_replace_callback('/\[([^][]*)]\(([^()]*)\)|(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|])/i',
function($matches) {
if(isset($matches[3])) {
// replace a regular URL with a link
return "<a href='".$matches[3]."' target='_blank'>untitled</a>";
} else {
// replace a URL with a link which is like this pattern: [LinkName](LinkAddress)
return "<a href=".$matches[2]." target='_blank'>".$matches[1]."</a>";
}
}, $str);
echo $str;
One way would be to do it like this. You merge your two expressions together with the alternative character |. Then in your callback function you just check if your third capture group is set (isset($matches[3])) and if yes, then your second regular expression matched the string and you replace a normal link, otherwise you replace with link/linktext.
I hope you understand everything and I could help you.

Related

PHP- Parsing words from a string without spaces?

My webpage has a variable, $currentPage. This is a string of the php token name of the page I'm currently on.
Example: All categories under the user section have names such as:
uAdminNew, uAdminEdit, ect..
I would like for a way to parse out the uAdmin and just determine what is the last word (New and Edit) and call upon functions from there.
I have my navigation system working through these names, therefore I can't change the names or I would to make it easier to parse. Such as adding delimiters.
Is this something only Regex can solve or is there a simpler solution I'm missing? If this is Regex could you explain or provide a link as to how I would go about using it to test against a specific list of strings? I'm very new to it.
For example, so:
$str = 'uAdminEdit';
$ar = preg_match('/([A-Z][^A-Z]+$)/', $str, $m);
echo $m[1]; // Edit
Does the pagename always start with uAdmin? If so, you could split the string by "uAdmin" with explode():
$page = 'uAdminEdit';
echo explode('uAdmin', $page)[1]; //Output: Edit
Or simply remove "uAdmin" with str_replace():
$page = 'uAdminEdit';
echo str_replace('uAdmin', '', $page); //Output: Edit
If you just want the section after uAdmin, use the regex capture groups
preg_match('/uAdmin(.*)/', $sub, $matches);
echo $matches[1]

find match of 1st word and and last

I have a url that looks some what like this
for-sale/stuff/state/used-bla-bla2-bla3-bla4-(bla5)---f10-85934.html
i'm trying to validate the format, in my function using this regex.
if (preg_match('/(?:^|(?:\-))(\w+)/g', $pathInfo, $matches)) {
echo $digit = $matches[0];
}
$pathInfo is the url given above.
Basically i want to match
make sure the directory is for-sale/stuff/
used-bla-bla2-bla3-bla4-(bla5)---f10-85934.html file must start with either used/new and end with a integer.html
no spaces are allowed.
After i validate, i want to get the ID. which in this case is 85934
Seems like you want something like this,
'~^for-sale/stuff/\S+/(?:used|new)\S*?(\d+)\.html$~'
DEMO
I'd suggest this sample piece of code and the following regex:
$re = "~\\bfor\\-sale\\/stuff\\/[^<> ]*?\\/(?:used|new)[^/ ]*?\\-(\\d+)\\.html\\b~";
$str = "\n";
preg_match_all($re, $str, $matches);
Regex: \bfor\-sale\/stuff\/[^<> ]*?\/(?:used|new)[^/ ]*?\-(\d+)\.html\b
I assume you have several URLs to validate in a variable string of text, thus I sugget using \b, and that the URL is inside some tag, so I'd use [^<> ]*? in order to limit capturing to just inside a tag.
The ID will be in the first capturing group (captured by \d+).
Spaces are also disallowed: [^<> ]*?, [^/ ]*?.

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

How to get text without some word (an ampersand issue)

I have a string like this: Hello #"user name". Where are you from, #"user name"?
I need to get the string between the " statements (user name), but I don't know how to do it.
I tried something like this /#("(.*)"|(.[^ ]*))\s*/ but it works wrong
First off, one possible regular expression that grabs the data you need is #"(.+?)", which matches any data within quotes preceded by #, and captures the data inside. Now that you've added the regex you've tried, I'm betting that the issue is that your expression is greedy: the regex engine tries to grab the longest match possible, so returns all of #"user name". Where are you from, #"user name". Adding the ? makes the expression lazy, so it will grab the shorter match.
Since you're interested in the content inside, I'm guessing that your final goal is to replace those strings with various types of user data dynamically, so one approach would be preg_replace_callback:
function user_data($matches) {
$key = $matches[1];
// return the user data for a $key like "user name"
}
$output = preg_replace_callback('/#"(.+?)"/', 'user_data', $input);
try looking at this: http://www.php.net/manual/en/function.strstr.php you might need to explode the white space after and get the first item from the array as well.
If there is only one #"..." per string, something like this should work
$matches = array();
preg_match("/#\"(.+?)\"/i", $inputstring, $matches);
echo($matches[1]);
Try this, if its not working, just escape " in pattern
/\#\&quote\;([\w\s]{0,})\&quote\;/

preg_replace on the matches of another preg_replace

I have a feeling that I might be missing something very basic. Anyways heres the scenario:
I'm using preg_replace to convert ===inputA===inputB=== to inputA
This is what I'm using
$new = preg_replace('/===(.*?)===(.*?)===/', '$1', $old);
Its working fine alright, but I also need to further restrict inputB so its like this
preg_replace('/[^\w]/', '', every Link or inputB);
So basically, in the first code, where you see $2 over there I need to perform operations on that $2 so that it only contains \w as you can see in the second code. So the final result should be like this:
Convert ===The link===link's page=== to The link
I have no idea how to do this, what should I do?
Although there already is an accepted answer: this is what the /e modifier or preg_replace_callback() are for:
echo preg_replace(
'/===(.*?)===(.*?)===/e',
'"$1"',
'===inputA===in^^putB===');
//Output: inputA
Or:
function _my_url_func($vars){
return ''.$vars[2].'';
}
echo preg_replace_callback(
'/===(.*?)===(.*?)===/',
'_my_url_func',
'===inputA===inputB===');
//Output: inputB
Try preg_match on the first one to get the 2 matches into variables, and then use preg_replace() on the one you want further checks on?
Why don't you do extract the matches from the first regex (preg_match) and treat thoses results and then put them back in a HTML form ?

Categories