Automatically link words using PHP - php

I have thousands of PHP pages which have a header and footer included using php, like
<?php include("header.php"); ?> //STATIC CONTENT HERE <?php include("footer.php"); ?>
I want to implement auto keyword linking for certain keywords in the static text. But I can only add PHP codes to my header or footer files.

This can be a complex operation. The steps:
Get all the files whose words you want to replace (glob)
Specify an array (or arrays) for the "find" and "replace" criteria
Iterate over the files returned by glob replacing the text as you go (preg_replace)
Write the new text (file_put_contents)
This sample code replaces all words in the $words array with a link to http://www.wordlink.com/<yourword>. If you need a different link for each word you'll need to specify $replace as an array using $1 where you want the searched word to appear in the replacement (and change $replace in the regex to $replace[$i]).
Also, the glob function below looks for all html files in the specified $filesDir directory. If you need something different you're going to have to manually edit the glob path yourself. Finally, the regular expression used only replaces whole words. i.e. if you wanted to replace the word super, the word superman will not have the word super replaced in the middle.
Oh, and the replace is NOT case sensitive as per the i modifier at the end of the pattern.
// specify where your static html files live
$filesDir = '/path/to/my/html/files/';
// specify where to save your updated files
$newDir = '/location/of/new/files/';
// get an array of all the static html files in $filesDir
$fileList = glob("$filesDir/*.html");
$words = array('super', 'awesome');
$replace = '$1';
// iterate over the html files.
for ($i=0; $i < count($fileList); $i++) {
$filePath = $filesDir . $fileList[$i];
$newPath = $newDir . $fileList[$i];
$html = file_get_contents($filePath);
$pattern = '#\b(' . str_replace('#', '\#', $words[$i]) . ')\b#i';
$html = preg_replace($pattern, $replace, $html);
file_put_contents($newPath, $html);
echo "$newpath file written\n";
}
Obviously, you need write-access to the new folder location. I would not recommend overwriting your original files. Translation:
Always backup before doing anything crazy.
P.S. the regexes are not UTF-8 safe, so if you're dealing with international characters you'll need to edit the regex pattern as well.
P.P.S. I'm really being kind here because SO is not a code-for-free site. Don't even think about commenting something like "it doesn't work" when I try it :) If it doesn't fit your specifications, feel free to peruse the php manual for the functions involved.

This is just an idea. I did a quick test and seems works...
<?php
include("header.php");
ob_start();
?>
//STATIC CONTENT HERE
<?php
$contents = ob_get_contents();
ob_end_clean();
// now you have all your STATIC CONTENT HERE into $contents var
// so you can use preg_replace on it to add your links
echo $contents_with_my_links;
include("footer.php");
?>
Indeed you should add this code to your current header/footer files.
OK. Its just an idea that solves the problem. As rdlowrey said this may be inefficient, but if you need replace keywords dynamically (with database based link, for instance) then this could be a good solution...

Related

How to Grab the Name of the 2nd Folder Without the Path

The last question was marked as a duplicate so I'm reopening since $_SERVER['REQUEST_URI']: isnt what I'm looking for because it displays the entire path.
I need to just display the name of the 2nd folder alone without the path, without forward slashes and without the pagename
Here is the structure of the URL:
http://example.com/sub/THISFOLDER/page.php
the domain will change, so I'm looking for a solution that will work for any domain as long as it targets the 2nd folder.
What I want to do is something like this:
if THISFOLDER is named folder1 then { include("header2.php"); }
To fetch the current folder name use this method:
$arr = explode('/', dirname(__FILE__));
$whatyouneed = $arr[count($arr)-1];
<?php
$str = 'http://example.com/sub/THISFOLDER/page.php';
$parts = parse_url($str);
$folders = explode('/', $parts['path']);
var_dump($folders[2]);
Output:
string(10) "THISFOLDER"
I used parse_url so it will work easily regardless of the exact url structure.
If you always want to get the last folder before the php page. (Even if it is not the second you can use this code).
<?php
$thisPath = $_SERVER['PHP_SELF'];;
$pattern = '/(\W+)\/(\w+)\/(\w+)/';
$replacement = '$2';
echo preg_replace($pattern, $replacement, $string);
?>
Sorry, I don't have a php instance spun up to actually test this, but the way it should work is this:
Looking at it from back to front:
It will find normal word characters and hit the slash, that is accounted for by "/". Then it will look for more normal characters. The '/' is covered, then it will look for any possible non-white space character to cover the rest. You want the middle portion.

Show another site in my page and change all links throught mine (like proxy)

I want to make like a proxy page (not for proxy at all) and as i knew i need to change all URLS SRC LINK and so on to others - for styles and images grab from right play, and urls goto throught my page going to $_GET["url"] and then to give me next page.
But iv tied to preg_replace() each element, also im not so good with it, and if on one website it works, on another i cant see CSS for example...
The first question is there are any PHP classes or just scripts to make it easy? (I was trying to google hours)
And if not help me with the following code :
<?php
$url = $_GET["url"];
$text = file_get_contents($url);
$data = parse_url($url);
$url=$data['scheme'].'://'.$data['host'];
$text = preg_replace('|<iframe [^>]*[^>]*|', '', $text);
$text = preg_replace('/<a(.*?)href="([^"]*)"(.*?)>/','<a $1 href="http://my.site/?url='.$url.'$2" $3>',$text);
$text = preg_replace('/<link(.*?)href="(?!http:\/\/)([^"]+)"(.*?)/', "<link $1 href=\"".$url."/\\2\"$3", $text);
$text = preg_replace('/src="(?!http:\/\/)([^"]+)"/', "src=\"".$url."/\\1\"", $text);
$text = preg_replace('/background:url\(([^"]*)\)/',"background:url(".$url."$1)", $text);
echo $text;
?>
Replacing with "src" №4 i need to denied replace when starts from double slash, because it could starts like 'src="//somethingdomain"' and not need to replace them.
Also i need to ignore replace №2 when href is going to the same domain, or it looks like need.site/news.need.site/324244
And is it possible to pass action in form throught my script? For example google search query.
And one small problem one web site is openning corrent some times before, but after iv open it hundreds times by this script in getting unknown symbols (without any divs body etc...) ��S�n�#�� i was trying to encode to UTF-8 ANSI but symbol just changing,
maybe they ban me ? oO
function link_replace($url,$myurl) {
$content = file_get_contents($url);
$content = preg_replace('#href="(http)(.*?)"#is', 'href="'.$myurl.'?url=$1$2"', $content);
$content = preg_replace('#href="([^http])(.*?)"#is', 'href="'.$myurl.'?url='.$url.'$1$2"', $content);
return $content;
}
echo link_replace($url,$myurl);
I'm not absolutely sure but I guess the result is just compressed e.g. with gzip try removing the accepted encoding headers while proxying the request.

How can i replace a line of php code with preg_replace

I have the following line of php code in some of my pages
<?php include("contactform.php"); ?>
I have a crude CMS where I exchange lines of code for user manageable tags, my hope is to convert this line of code into [contact] so that people can add or remove it at their leisure. This is how far i've got...
i.e. $file = preg_replace('#<?php include("contactform.php"); ?>#i', "[contact]", $file);
$file looks something like this...
<h1 class="title">Title</h1>
<p>Text</p>
<?php include("contactform.php"); ?>
So the PHP code has not been stripped out by the server as we are editing the file and not viewing it.
I'm pretty new to PHP so I guess i'm being really stupid, is there a way to do this?
If you want to do 1:1 string replacements, then use the simpler str_replace
$file = str_replace('<?php include("contactform.php"); ?>', "[contact]", $file);
With a preg_replace you need to escape meta characters like ? and ( with backslashes:
$file = preg_replace('#<\?php include\("contactform.php"\); \?>#i', "[contact]", $file);
And using a regex would only provide any advantage if you want to make it more resilient of whitespace for example. Use \s+ instead of literal spaces in that case.

PHP: finding, replacing, shortening, and prettifying user links with <a> tags, ellipses, and link icons

When a user enters a URL, e.g. http://www.google.com, I would like to be able to parse that text using PHP, find any links, and replace them with <a> tags that include the original URL as an HREF.
In other words, http://www.google.com will become
http://www.google.com
I'd like to be able to do this for all URLs of these forms (with .com interchangeable with any TLD):
http://www.google.com
www.google.com
google.com
docs.google.com
What's the most performant way to do this? I could try writing some really fancy regex, but I doubt that's the best method available to me.
For bonus points, I'd also like to prepend http:// to any URL lacking it, and strip the display text itself down to something of the form http://www.google.com/reallyLongL... and display an external link icon afterwards.
Trying to find links in the format domain.com is going to be a pain in the butt. It would require keeping track of all TLDs and using them in the search.if you didnt the end of the last sentence i typed and the beginning of this sentence would be a link to http://search.if. Even if you did .in is a valid TLD and a common word.
I'd recommend telling your users they have to begin links with www. or http:// then write a simple regex to capture them and add the links.
www.google.com
This is not a URL, it's a hostname. It's generally not a good idea to start marking up bare hostnames in arbitrary text, because in the general case any word or sequence of dot-separated words is a perfectly valid hostname. That means you up with horrible hacks like looking for leading www. (and you'll get questions like “why can I link to www.stackoverflow.com but not stackoverflow.com?”) or trailing TLDs (which gets more and more impractical as more new TLDs are introduced; “why can I like to ncm.com but not ncm.museum?”), and you'll often mark up things that aren't supposed to be links.
I could try writing some really fancy regex
Well I can't see how you'd do it without regex.
The trick is coping with markup. If you can have <, & and " characters in the input, you mustn't let them into HTML output. If your input is plain text, you can do that by calling htmlspecialchars() before applying a simple replacement on a pattern like that in nico's answer.
(If the input already contains markup, you've got problems and you'd probably need an HTML parser to determine which bits are markup to avoid adding more markup inside of. Similarly, if you're doing more processing after this, inserting more tags, those steps are may have the same difficulty. In ‘bbcode’-like languages this often leads to bugs and security problems.)
Another problem is trailing punctuation. It's common for people to put a full stop, comma, close bracket, exclamation mark etc after a link, which aren't supposed to be part of the link but which are actually valid characters. It's useful to strip these off and not put them in the link. But then you break Wiki links that end in ), so maybe you want to not treat ) as a trailing character if there's a ( in the link, or something like that. This sort of thing can't be done in a simple regex replace, but you can in a replacement callback function.
HTML Purifier has a built-in linkify function to save you all the headaches.
It's other features are also simply too useful to pass up if you're dealing with any kind of user input that you also have to display.
Not so fancy regexps that should work
/\b(https?:\/\/[^\s+\"\<\>]+)/ig
/\b(www.[^\s+\"\<\>]+)/ig
Note that the last two would be impossible to do correctly as you cannot distinguish google.com from something like this.Where I finish one sentence and don't put a space after the full stop.
As for shortening the URLs, having your URL in $url:
if (strlen($url) > 20) // Or whatever length you like
{
$shortURL = substr($url, 0, 20)."…";
}
else
{
$shortURL = $url;
}
echo '<a href="'.$url.'" >'.$shortURL.'</a>';
From http://www.exorithm.com/algorithm/view/markup_urls
function markup_urls ($text)
{
// split the text into words
$words = preg_split('/([\s\n\r]+)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$text = "";
// iterate through the words
foreach($words as $word) {
// chopword = the portion of the word that will be replaced
$chopword = $word;
$chopword = preg_replace('/^[^A-Za-z0-9]*/', '', $chopword);
if ($chopword <> '') {
// linkword = the text that will replace chopword in the word
$linkword='';
// does it start with http://abc. ?
if (preg_match('/^(http:\/\/)[a-zA-Z0-9_]{2,}.*/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
$linkword = ''.$chopword.'';
// does it equal abc.def.ghi ?
} else if (preg_match('/^[a-zA-Z]{2,}\.([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,}(\/.*)?/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
$linkword = ''.$chopword.'';
// does it start with abc#def.ghi ?
} else if (preg_match('/^[a-zA-Z0-9_\.]+\#([a-zA-Z0-9_]{2,}\.)+[a-zA-Z]{2,}.*/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9]*$/', '', $chopword);
$linkword = ''.$chopword.'';
}
// replace chopword with linkword in word (if linkword was set)
if ($linkword <> '') {
$word = str_replace($chopword, $linkword, $word);
}
}
// append the word
$text = $text.$word;
}
return $text;
}
I got this working exactly the way I want here:
<?php
$input = <<<EOF
http://www.example.com/
http://example.com
www.example.com
http://iamanextremely.com/long/link/so/I/will/be/trimmed/down/a/bit/so/i/dont/mess
/up/text/wrapping.html
EOF;
function trimlong($match)
{
$url = $match[0];
$display = $url;
if ( strlen($display) > 30 ) {
$display = substr($display,0,30)."...";
}
return ''.$display.' <img src="http://static.goalscdn.com/img/external-link.gif" height="10" width="11" />';
}
$output = preg_replace_callback('#(http://|www\\.)[^\\s<]+[^\\s<,.]#i',
array($this,'trimlong'),$input);
echo $output;

php replace a pattern

Suppose in a file there is a pattern as
sumthing.c: and
asdfg.c: and many more.. with *.c: pattern
How to replace this with the text yourinput and save the file using php
The pattern is *.c
thanks..
You can read the contents of the file into a PHP string using file_get_contents, do the *.c to yourinput replacement in the string and write it back to the file using file_put_contents:
$filename = '...'; // name of your input file.
$file = file_get_contents($filename) or die();
$replacement = '...'; // the yourinput thing you mention in the quesion
$file = preg_replace('/\b\w+\.c:/',$replacement,$file);
file_put_contents($file,$filename) or die();
You can use PHP's str_replace or str_replace ( in case its a regex pattern). CHeck the syntax of these two functions and replace the *.c with your input.
.c pattern should be something like /?(.c)$/
First open file and get it's content:
$content = file_get_contents($path_to_file);
Than modify the content:
$content = preg_replace('/.*\.c/', 'yourinput');
Finally save the result back to the file.
file_put_contents($path_to_file, $content);
Note: You may consider changing the regexp because this way it match the '.c' string and everything before it. Maybe '/[a-zA-Z]*\.c/' is what you want.

Categories