php remove results of regex from file content - php

I want to remove the result of my regex from a file.. how should I do this...
$pattern = "/<a href=\"(.*?)a>/s";
$html = file_get_contents('content.html');
$check = preg_match_all($pattern,$html,$match);
foreach($match[1] as $result)
{
// what should I put here
}

Try this..
This regex will remove the content.html and replace with null.
$pattern = "/<a href=\"(.+?)\">/s";
$html = file_get_contents('content.html');
$subject="<a href="content.html">"
$check = preg_replace($pattern,'',$subject); //preg_replace('pattern','replacement','subject');

I want to remove the result of my regex
Use preg_replace instead of preg_match_all.
$pattern = '~<a href=(.*?)</a>~'; // only match on same line
$html = file_get_contents('content.html');
$check = preg_replace($pattern, '', $html);

Related

Find URL in string and turn into a link

I'm using the code given on this page to look through a string and turn the URL into an HTML link.
It works quite well, but there is a little issue with the "replace" part of it.
The problem occurs when I have almost identical links. For example:
https://example.com/page.php?goto=200
and
https://example.com/page.php
Everything will be fine with the first link, but the second will create a <a> tag in the first <a> tag.
First run
https://example.com/page.php?goto=200
Second
https://example.com/page.php?goto=200">https://example.com/page.php?goto=200</a>
Because it's also replacing the html link just created.
How do I avoid this?
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
?>
You need to add a whitespace identifier \s in your regex at the start, also remove \b because \b only returns the last match.
You regex can written as:
$reg_exUrl = "/(?i)\s((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/"
check this one: https://regex101.com/r/YFQPlZ/1
I have change the replace part a bit, since I couldn't get the suggested regex to work.
Maybe it can be done better, but I'm still learning :)
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $key => $newLinks){
if(strstr( $newLinks, ":" ) === false){
$url = 'https://'.$newLinks;
}else{
$url = $newLinks;
}
// Create Search and Replace strings
$replace .= ''.$url.',';
$newLinks = '/'.preg_quote($newLinks, '/').'/';
$string = preg_replace($newLinks, '{'.$key.'}', $string, 1);
}
$arr_replace = explode(',', $replace);
foreach ($arr_replace as $key => $link) {
$string = str_replace('{'.$key.'}', $link, $string);
}
}
//Return result
return $string;
}

Preg_replace replace dashes with spaces between tags

I have a HTML code and would like to replace only the dashes with spaces but only between specific tags.
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(\d*)[-*](\d*)<\/$tagname>/";
$replace = " ";
$string = preg_replace($pattern, $replace, $string);
}
CODE EXAMPLE:
<div class="xxx">
start
World
Fantastic-yyy-zz
peter-hey
</div>
RESULT: Although 'peter hey' is without dashes it's more important the Tag's values.
<div class="xxx">
start
World
Fantastic yyy zz
peter-hey
</div>
You DO NOT need regular expressions for this task:
$contents = '<div class="xxx">
start
World
Fantastic-yyy-zz
peter-hey
</div>';
$doc = new DOMDocument();
$doc->loadXML($contents);
$tagName = 'a';
$tags = $doc->getElementsByTagName($tagName);
foreach ($tags as $tag) {
$newValue = str_replace('-', ' ', $tag->nodeValue);
$tag->nodeValue = $newValue;
}
echo $doc->saveHTML();
Demo: http://ideone.com/rI6k8b
#zerkms thank you for your help and patience, tried almost exactly as you told but it shows a warning and doesn't make a change.
Warning: DOMDocument::loadXML(): Extra content at the end of the document in Entity
CODE:
function process(&$vars) {
$theme = get_theme();
if ($vars['elts']['#xxx'] == 'main') {
$vars['bread'] = $theme->page['bread'];
/*add code*/
$doc = new DOMDocument();
$doc->loadXML($vars['bread']);
$tagName = 'a';
$tags = $doc->getElementsByTagName($tagName);
foreach ($tags as $tag) {
$newValue = str_replace('-', ' ', $tag->nodeValue);
$tag->nodeValue = $newValue;
}
echo $doc->saveHTML();
/*end add code*/
}
}
#zerkms, I give you the answer as valid as you really gave a correct answer. I'm also amazed to say that I found some interesting answers:
CODE TO FIND INFO
$tagname = 'a';
$pattern = "/<$tagname ?.*>(.*)\-+(.*)<\/$tagname>/";
$matches = "";
preg_match($pattern, $contents, $matches);
CODE TO CHANGE : As I only have a piece of code, I really don't need to check the tag is 'a'.
$pattern = "/>(.*)\-+(.*)\-+(.*)</";
$replace = ">$1 $2 $3<";
$res = preg_replace($pattern, $replace, $contents);
//$contents is my string with the code.
Hope it really helps someone.

Using preg_replace to get id

how to get id url with preg_replace.
this is the link:
http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id how do I get the id? in the case would be: 5b87f8eaa7c20f79c3257eb3ec0a35e0
In this case I recommend not to use preg_match (preg_replace would be used to replace something.
Simply use
$array = explode('/',$_SERVER['REQUEST_URI']);
$id = $array[1];
If you must use preg_match:
$array = array();
preg_match('#^/photo/([0-9a-f]{32})/id$#',$_SERVER['REQUEST_URI'],$array);
$id = $array[1];
You can do this easily using strripos to find the last / in the URL.
$url = $_SERVER['REQUEST_URI'];
if (($pos = strripos($url, '/')) !== false) {
$id = substr($url, $pos + 1);
}
else {
trigger_error('You must supply a valid photo ID');
}
If you would like to just extract that id string, you can use:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
preg_match($pattern, $id_url, $output_array);
echo $output_array[1];
Or, to make the replacement:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
$replacement = "your replacement";
$replaced_url = preg_replace($pattern, $replacement, $id_url);
echo $replaced_url;
PHP Live Regex - a useful tool for testing your patterns

PHP grabbing content between two strings

// get CONTENT from united domains footer
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
// remove spaces from CONTENT
$content = preg_replace('/\s+/', '', $content);
// match all tld tags
$regex = '#target="_parent">.(.*?)</a></li><li>#';
preg_match($regex, $source, $matches);
print_r($matches);
I am wanting to match all of the TLDs:
Each tld is preceded by target="_parent">. and followed by </a></li><li>
I am wanting to end up with an array like array('africa','amsterdam','bnc'...ect ect )
What am I doing wrong here?
NOTE: The second step to remove all the spaces is just to simplify things.
Here's a regular expression that will do it for that page.
\.\w+(?=</a></li>)
REY
PHP
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
preg_match_all('/\.\w+(?=<\/a><\/li>)/m', $content, $matches);
print_r($matches);
PHPFiddle
Here are the results:
.africa, .amsterdam, .bcn, .berlin, .boston, .brussels, .budapest, .gent, .hamburg, .koeln, .london, .madrid, .melbourne, .moscow, .miami, .nagoya, .nyc, .okinawa, .osaka, .paris, .quebec, .roma, .ryukyu, .stockholm, .sydney, .tokyo, .vegas, .wien, .yokohama, .africa, .arab, .bayern, .bzh, .cymru, .kiwi, .lat, .scot, .vlaanderen, .wales, .app, .blog, .chat, .cloud, .digital, .email, .mobile, .online, .site, .mls, .secure, .web, .wiki, .associates, .business, .car, .careers, .contractors, .clothing, .design, .equipment, .estate, .gallery, .graphics, .hotel, .immo, .investments, .law, .management, .media, .money, .solutions, .sucks, .taxi, .trade, .archi, .adult, .bio, .center, .city, .club, .cool, .date, .earth, .energy, .family, .free, .green, .live, .lol, .love, .med, .ngo, .news, .phone, .pictures, .radio, .reviews, .rip, .team, .technology, .today, .voting, .buy, .deal, .luxe, .sale, .shop, .shopping, .store, .eus, .gay, .eco, .hiv, .irish, .one, .pics, .porn, .sex, .singles, .vin, .vip, .bar, .pizza, .wine, .bike, .book, .holiday, .horse, .film, .music, .party, .email, .pets, .play, .rocks, .rugby, .ski, .sport, .surf, .tour, .video
Using the DOM is cleaner:
$doc = new DOMDocument();
#$doc->loadHTMLFile('http://www.uniteddomains.com/index/footer/');
$xpath = new DOMXPath($doc);
$items = $xpath->query('/html/body/div/ul/li/ul/li[not(#class)]/a[#target="_parent"]/text()');
$result = '';
foreach($items as $item) {
$result .= $item->nodeValue; }
$result = explode('.', $result);
array_shift($result);
print_r($result);

Strip single ended tags from string (img, hr, etc)

function stripSingleEndedTag($content, $allowed = array()){
(array)$allowed;
$singletags = array('<meta>','<img>','<input>','<hr>','<br>','<link>','<isindex>','<base>','<meta>','<nextid>','<bork>');
$stripthese = arrayfilterout ($singletags, $allowed);
$stripthese = str_replace(array('<','>'),'',$stripthese);
$pattern = '/<('. implode('|', $stripthese) .')[^>]+\>/i';
$content = preg_replace($pattern, "", $content);
return $content;
}
What I've got here will strip out a single ended tag like <bork /> but only if there is some character after 'bork' and before '>'. <bork > and <bork/> are stripped, but not <bork>
BTW, cant use strip_tags().
You can use:
$pattern = '/\<('. implode('|', $stripthese) .')[^>]*\>/i';
Original Answer:
Do you want to get rid of <bork /> and <bork/>, but not <bork> and <bork >?
It looks like what you want is:
$pattern = '/<('. implode('|', $stripthese) .').*\\\>/i';
Update: fix for greedy match:
$pattern = '/<('. implode('|', $stripthese) .').*?\\\>/i';
$pattern = '/<('. implode('|', $stripthese) .')[^>]*\\\>/i';

Categories