Using preg_replace to modify link to file - php

How to use preg_replace to replace some of a link, but keep the original link as text?
I tried using https://www.phpliveregex.com/#tab-preg-replace, but preg_replace is far to complex for my knowledge.
In short I would like to transform this:
!f:\cases\case\20190813_case.pdf!
To this:
<a href='file://server-files/data/cases/case/20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
So that the user sees the network drive as a letter, but the link is actually a link via the server name.
$string = "!f:\cases\case\20190813_case.pdf!"
$string = str_ireplace("F:\\", "file://server-files/Data/", $string);
$string = preg_replace("/\!(.*?)\!/", "<a href='$1'>$1</a>", $string);
This gives:
<a href='file://server-files/Data/cases\case\20190813_case.pdf'>file://server-files/cases/case\20190813_case.pdf</a>
It works fine, but I would like to format link text like this
<a href='file://server-files/Data/cases\case\20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
Does anyone know if it is possible?
And it might be possible to skip the str_ireplace, and do it all in the preg_replace line?
EDIT
The actual text is like this (had to a anonymize some parts).
Vi har afleveret et skitseprojekt til et nyt domicil for XXXXX
XXXXXXXX.
Mappen kan ses her !F:\A-sager\XXXXXXXX - nyt
domicil\8-Forslag\D-Sendt\fremlagt for bygherren\20190813 domicil.pdf!
Projektet er endnu ikke offentligt.
The text is urlencoded and stored in a XML file.

There is no reason to use regular expressions for simple string replacements. Not saying you should not get over that bearer and learn them, just not needed here really.
<?php
$str = '!f:\cases\case\20190813_case.pdf!';
$str1 = substr($str, 1, strlen($str) -2);
$str2 = substr($str, 4, strlen($str) -5);
echo "<a href='file://{$str2}'>{$str1}</a>";
//<a href='file://cases\case\20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
//if slashes are wrong...
var_dump(str_replace('\\', '/', $str1)) ;//see const DIRECTORY_SEPARATOR
//string(31) "f:/cases/case/20190813_case.pdf"
PHP has a string function for about everything you could ever need.
Update: You stated that there can be multiple links in one "string" (in a question since deleted). You've not provided an example of the format though. Assuming a delimiter of ! and you wanting to use pcre try...
<?php
$str = '!f:\cases\case\20190813_case1.pdf!!f:\cases\case\20190813_case2.pdf!!f:\cases\case\20190813_case3.pdf!';
preg_match_all('#!(.*?)!#', $str, $matches);
var_dump($matches[1]);
There are often many ways to accomplish the same basic string manipulation (strtok, explode, etc).
...Seeing your update, sounds like using some XML parser and iterating over these you should be able to use the examples I've provided, specifically the regular expression to isolate it. Watch for false positives if exclamation marks are in the text? Ask if you get stuck on anything else specific and good luck!
Typically I'd say aim to write the code that is most clear and concise. Readable.

I suggest:
$str = <<<'EOD'
Vi har afleveret et skitseprojekt til et nyt domicil for XXXXX XXXXXXXX.
Mappen kan ses her !F:\A-sager\XXXXXXXX - nyt domicil\8-Forslag\D-Sendt\fremlagt for bygherren\20190813 domicil.pdf!
Projektet er endnu ikke offentligt.
EOD;
echo preg_replace_callback('~!f:(.*?)!~i', function ($m) {
return '<a href="file://server-files/Data'
. strtr(rawurlencode($m[1]), ['%5C'=> '/'])
. '">f:' . $m[1] . '</a>';
}, $str);

Related

Laravel <br> to newline

I know I can do a simple replace when wanting to convert <br> tags to new lines. But I am facing a problem with parsing because provided <br> tags are not empty.
<br style=\"color: rgb(83, 83, 83); font-family: \" helvetica=\"\" ...
Back end is not mine, so there is no point in discussing about good or bad coding here, I am just wondering if there is a solution to replace those with simple new lines.
Something like nl2br() but reverse.
EDIT:
Don't know what use is to show code when I know 'why' is the thing that I've tried not working...but here goes
public function removeSingleHtmlFormatting($single)
{
$single->short_description = str_replace("<br>", "\r\n", $single->short_description);
$single->short_description = strip_tags($single->description);
$single->short_description = preg_replace("/ /", " ", $single->short_description);
}
Of course replace doesn't work because there is no such string to replace...I have no idea where to start parsing it
instead of
str_replace("<br>", "\r\n", $single->short_description);
try
preg_replace("/<br.*>/U", "\r\n", $single->short_description);
This way the regular expression matches <br> including anything inside it, not only empty <br>.

Regex to select url except when = is directly infront of it

I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example

php regex to get middle of string

I parse an html page into a plain text in order to find and get a numeric value.
In the whole html mess, I need to find a string like this one:
C) Debiti33.197.431,90I - Di finanziamento
I need the number 33.197.431,90 (where this number is going to change on every html parsing request.
Is there any regex to achieve this? For example:
STARTS WITH 'C) Debiti' ENDS WITH 'I - Di finanziamento' GETS the middle string that can be whatever.
Whenever I try, I get empty results...don't know that much about regex.
Can you please help me?
Thank you very much.
You could try the below regex,
^C\) Debiti\K.*?(?=I - Di finanziamento$)
DEMO
PHP code would be,
<?php
$mystring = "C) Debiti33.197.431,90I - Di finanziamento";
$regex = '~^C\) Debiti\K.*?(?=I - Di finanziamento$)~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 33.197.431,90
This should work. Read section Want to Be Lazy? Think Twice.
(?<=\bC\) Debiti)[\d.,]+(?=I - Di finanziamento\b)
Here is demo
sample code:
$re = "/(?<=\\bC\\) Debiti)[\\d.,]+(?=I - Di finanziamento\\b)/i";
$str = "C) Debiti33.197.431,90I - Di finanziamento";
preg_match($re, $str, $matches);

PHP regex on weird situations

I'm trying to scrape a website using some regex. But the site isn't written in well formatted html. In fact, the html is horrible and not structured hardly at all. But I've managed to tackle most of it. The problem I'm encountering now is that in some emails, a span is wrapped around a random part of the email like so:
****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com
Is there a way to retrieve the emails with all this inconsistency?
$string ='****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com';
$pattern = "/<\/?span[^>]*>/";
$string = preg_replace($pattern, "", $string);
after that $string will be only mails
****.*******#gmail.com
************#yahoo.com
**********#mail.com
*******#gmail.com
Your code will be like this
$text[1]->innertext = "Where innertext contains something like: "<em>Local (Open)
Tournament.</em> ****.*******#g<span class="tournamenttext">mail.com</span>"
// Firstly clear spans
$pattern = "/<\/?span[^>]*>/";
$text[1]->innertext = preg_replace($pattern, "", $text[1]->innertext);
// Preg Match mail
$email_regex = "^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"; // Just an example email match regex
preg_match($email_regex, $text[1]->innertext, $theMatch);
echo '<pre>' . print_r($theMatch, true) . '</pre>';
You could simply remove all span tags by replacing </?span[^>]*> with nothing and try your favourite email address finder on the result.

Remove all text between <hr> and <embed> tag?

<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;

Categories