PHP Matching URLs and using preg_replace_callback()

PHP Matching URLs and using preg_replace_callback() - php

I use the following to find all URL´s inside $content
$content = preg_match_all( '/(http[s]?:[^\s]*)/i', $content, $links );
But this will depend on the http:// part in http://www.google.com/some/path .
My questions are :
1 - How can I modify it in order to hit also the links that are start with only www , e.g. www.google.com?
2 - The main aim is to find the links, and replace them with a value that is returned from another function. I tried preg_match_callback() , but it is not working (probably using it wrong ..
$content = preg_replace_callback(
"/(http[s]?:[^\s]*)/i",
"my_callback",
$content);
function my_callback(){
// do a lot of stuff independently of preg_replace
// adding to =.output...
return $output;
}
Now , in my logic (which is probably wrong ) all matches from the $content would be replaced by $output. what am I doing wrong ?
(please no anonymous functions - i am testing on an old server)
EDIT I - after comments , trying to clarify with more details
function o99_simple_parse($content){
$content = preg_replace_callback( '/(http[s]?:[^\s]*)/i', 'o99_simple_callback', $content );
return $content;
}
callback :
function o99_simple_callback($url){
// how to get the URL which is actually the match? and width ??
$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://something' . urlencode($url) . '?w=' . $width ;
return $url; // what i really need to replace
}

To modify the regex you already have to allow URLs that begin with www, you'd simply write this:
/((http[s]?:|www[.])[^\s]*)/i
+ ++++++++

Related

Apply php function to shortcode

I'll start by saying I'm fairly new to coding so I'm probably going about this the wrong way.
Basically I've got the below php function that changes urls to the page title of the url instead of a plain web address. So instead of www.google.com it would appear as Google.
<?php
function get_title($url){
$str = file_get_contents($url);
if(strlen($str)>0){
$str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
return $title[1];
}
}
?>
This is great but to implement this I have to use the below code.
echo get_title("http://www.google.com/");
However this just works on a predefined URL. What I have set up on my site at the moment is a shortcode in a html widget.
<a href='[rwmb_meta meta_key="link_1"]'>[rwmb_meta meta_key="link_1"]</a>
This shortcode displays a url/link that is input by the user in the backend of Wordpress and displays it on the frontend as a link. However I want to apply the get_title function to the above shortcode so instead of the web address it shows the page title.
Is this possible?
Thanks in advance.

for name of a url from a link you can use parse_url($url, PHP_URL_HOST);
easier way would be to have an array of links for example
$links[] = 'some1 url here';
$links[] = 'some2 url here';
then just loop your $links array with the function.
foreach($links as $link)get_title($link);
https://metabox.io/docs/get-meta-value/
try:
$files = rwmb_meta( 'info' ); // Since 4.8.0
$files = rwmb_meta( 'info', 'type=file' ); // Prior to 4.8.0
if ( !empty( $files ) ) {
foreach ( $files as $file ) {
echo $file['url'];
}
}

Wordpress filter: `content_save_pre` hook fails to replace content only with `preg_replace` function

I need to replace code block fencing within post_content before save. The post content is written in markdown locally, pushed to github and then Wordpress.
I need the markdown fencing ```js <some code> ``` to be replaced with: [js] <some code> [/js] before saving to Wordpress.
See my working repl: https://repl.it/KDz2/1 My function works perfectly fine outside of Wordpress.
Wordpress is invoking the function, but for some reason the replace is failing. I know this because I can get a simple str_replace to work just fine within Wordpress.
Issue;
preg_replace is failing to return the replaced content within Wordpress filter. No errors thrown. Why is this failing?
For reference, my functions.php file includes:
add_filter( 'content_save_pre', 'markdown_code_highlight_fence');
function markdown_code_highlight_fence( $content ) {
$newContent = preg_replace('/^ *(`{3,}|~{3,}) *(\S+)? *\n([\s\S]+?)\s*\1 *(?:\n+|$)/m', '
[${2}]
$3
[\\\${2}]
', $content);
return $newContent;
}
Also tried this
function markdown_code_highlight_fence( $content ) {
$newContent = preg_replace_callback('/^ *(`{3,}|~{3,}) *(\S+)? *\n([\s\S]+?)\s*\1 *(?:\n+|$)/m', function($match){
$lang = $match[2] == '' ? 'js' : $match[2];
return '
['.$lang.']'
.' '.
$match[3]
.' '.
'[\\'.$lang.']'; }, $content);
return $newContent;
}

Not sure why preg_replace isn't working within Wordpress. If anyone can help shed some light, please do.
In the interim, I have a working solution as follows:
add_filter( 'content_save_pre', 'markdown_code_highlight_fence_replace', 1, 1);
function markdown_code_highlight_fence_replace( $content ) {
preg_match_all('/`{3,}(\S+)?/', $content, $matches);
foreach ($matches[1] as $key=>$match) {
if($match === '') continue;
$content = preg_replace('/`{3,}/', '[/'.$match.']', $content, 2);
$content = str_replace('[/'.$match.']'.$match, '['.$match.']', $content);
}
return $content;
}

PHP Fatal error: Cannot use object of type simple_html_dom as array

I am working on web scraping application using simple_html_dom. I need to extract all the images in a web page. The following are the possibilities:
<img> tag images
if there is a css with the <style> tag in the same page.
if there is an image with the inline style with <div> or with some other tag.
I can scrape all the images by using the following code.
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path;
download_file($img_url, $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$html = str_replace($img_url , $url_to_be_change , $html );
}
return $html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Please note that, I am modifying the HTML too after image downloading.
downloading is working fine. but when i am trying to save the HTML, then its giving the following error:
PHP Fatal error: Cannot use object of type simple_html_dom as array
Important: its working perfectly fine, if I am not using str_replace and second loop.
Fatal error: Cannot use object of type simple_html_dom as array in /var/www/html/app/framework/cache/includes/simple_html_dom.php on line 1167

Guess №1
I see a possible mistake here:
$html = str_get_html($html);
Looks like you pass an object to function str_get_html(), while it accepts a string as an argument. Lets fix that this way:
$html = str_get_html($html->plaintext);
We can only guess what is the content of the $html variable, that comes to this piece of code.
Guess №2
Or maybe we just need to use another variable in function download_images to make your code correct in both cases:
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$result_html = "";
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$result_html = str_replace($img_url , $url_to_be_change , $html );
}
return $result_html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Explanation: if there was no matches (array $matches is empty) we never go in the second cycle, thats why variable $html still has the same value as at beginning of the function. This is common mistake when you're trying to use same variable in the place of code where you need two different variables.

As the error message states, you are dealing with an Object where you should have an array.
You could try tpyecasting your object:
$array = (array) $yourObject;
That should solve it.

I had this error, I solved it by using (in my case) return $html->save(); in end of function.
I can't explain why two instances with different variable names, and scoped in different functions made this error. I guess this is how the "simple html dom" class works.
So just to be clear, try: $html->save(), before you do anything else after
I hope this information helps somebody :)

greek url conversion and trim unwated numbers and symbols

This problem is little complicated since i'm newbee to php encoding.
My site uses utf-8 encoding.
After a lot of tests, i found some solution. I use this kind of code:
function chr_conv($str)
{
$a=array with pattern('%CE%B2','%CE%B3','%CE%B4','%CE%B5' etc..);
$b=array with replacement characters(a,b,c,d, etc...);
return str_replace($a, $b2, $str);
}
function replace_old($str)
{
$a1 = array ('index.php','/http://' etc...);
$a2 = array with replacement characters('','' etc...);
return str_replace($a1, $a2, $str);
}
function sanitize($url)
{
$url= replace_old(replace_old($url));
$url = strtolower($url);
$url = preg_replace('/[0-9]/', '', $url);
$url = preg_replace('/[?]/', '', $url);
$url = substr($url,1);
return $url;
}
function wbz404_process404()
{
$options = wbz404_getOptions();
$urlRequest = $_SERVER['REQUEST_URI'];
$url = chr_conv($urlRequest);
$requestedURL = replace_old(replace_old($url));
$requestedURL .= wbz404_SortQuery($urlParts);
//Get URL data if it's already in our database
$redirect = wbz404_loadRedirectData($requestedURL);
echo sanitize($requestedURL);
echo "</br>";
echo $requestedURL;
echo "</br>";
}
When incoming url is:
/content.php?147-%CE%A8%CE%AC%CF%81%CE%B9-%CE%BC%CE%B5-%CF%80%CF%81%CE%AC%CF%83%CE%B1%28%CE%A7%CE%BF%CF%8D%CE%BC%CF%80%CE%BB%CE%B9%CE%BA%29";
I get:
/content.php?147-psari-me-prasa-choumplik
I want only:
/psari-me-prasa-choumplik
without the content.php?147- before URL.
BUT the most important problem is that I get ENDLESS LOOP instead of correct URL.
What am i doing wrong?
Have in mind that .htaccess solution won't work since i have a lighttpd server, not Apache.

If you need
I am assuming it's not always ?147- that you need to skip. But always after the first hyphen. In which case, before the echo add the following:
$requestedURL = substr($requestedURL, strrpos( $requestedURL , '-') +1 );
This will search for the position of the first hyphen and return that, add one so you skip the hyphen itself, and use that to cut the $requestedURL string up after the hyphen to the end of the string.
If it's always /content.php?127- then replace strrpos( $requestedURL , '-') +1 with the number 17.

RegEx to convert URLs in text into clickable ones with custom anchor text [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Need a good regex to convert URLs to links but leave existing links alone
This is a my sample input:
http://www.website.com/1/
Click here http://www.website.com/2/ or visit the website: http://www.website.com/3/
or http://www.website.com/4/
http://www.website.com/5/
I want a PHP function that converts the URLs inside the text into tags, like so:
http://www.website.com/1/
Click here or visit the website: http://www.website.com/3/
or http://www.website.com/4/
http://www.website.com/5/
There is a catch on line 2: if the URL is preceded by the word here then the word should be used as the anchor text instead. I need to do this in PHP. I think preg_replace with /e switch might help me accomplish this task but I am not sure. This is the (borrowed) regex I've used so far:
preg_replace("#(^|[\n ])([\w]+?://[\w\#$%&~/.\-;:=,?#\[\]+]*)#is", "\\1\\2", $ret);
// ^---- I've tried adding "|here "
// But I cannot get the order of \\1 and \\2 right
Please advice.

"But I cannot get the order of \1 and \2 right"
The number of the capturing groups are in the order of the opening brackets, so the first opening bracket will always be $1. If you don't want that, use named groups.
For your problem you can try this regex
(?:(here)\s*|\b)(\w+?://[\w\#$%&~/.\-;:=,?#\[\]+]*)
It will have "here" in $1 and the link in $2. If "here" is not found then $1 is empty.
See it here on Regexr
So, then you need to replace dependent on the content of $1. If it is empty then replace the match with
$2
else with
$1
I think this should be possible using preg_replace_callback

I found this.
It sounds interesting, thought I have NOT tested it myself, I'm doing it now though.
The class goes like this:
class MakeItLink {
protected function _link_www( $matches ) {
$url = $matches[2];
$url = MakeItLink::cleanURL( $url );
if( empty( $url ) ) {
return $matches[0];
}
return "{$matches[1]}<a href='{$url}'>{$url}</a>";
}
public function cleanURL( $url ) {
if( $url == '' ) {
return $url;
}
$url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%#$*'()x80-xff]|i", '', $url );
$url = str_replace( array( "%0d", "%0a" ), '', $url );
$url = str_replace( ";//", "://", $url );
/* If the URL doesn't appear to contain a scheme, we
* presume it needs http:// appended (unless a relative
* link starting with / or a php file).
*/
if(
strpos( $url, ":" ) === false
&& substr( $url, 0, 1 ) != "/"
&& !preg_match( "|^[a-z0-9-]+?.php|i", $url )
) {
$url = "http://{$url}";
}
// Replace ampersans and single quotes
$url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&$1", $url );
$url = str_replace( "'", "'", $url );
return $url;
}
public function transform( $text ) {
$text = " {$text}";
$text = preg_replace_callback(
'#(?])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?#\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
array( 'MakeItLink', '_link_www' ),
$text
);
$text = preg_replace( '#(<a>]+?>|>))<a>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
$text = trim( $text );
return $text;
}
}
It’s very easy to use, just load up the text you want to search for
link and call the transform method:
$text = MakeItLink::transform( $text );
All of this code came out of WordPress, which is licensed under the
GPL

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Matching URLs and using preg_replace_callback() - php

To modify the regex you already have to allow URLs that begin with www, you'd simply write this: /((http[s]?:|www[.])[^\s]*)/i + ++++++++

Related

Apply php function to shortcode

Wordpress filter: `content_save_pre` hook fails to replace content only with `preg_replace` function

PHP Fatal error: Cannot use object of type simple_html_dom as array

greek url conversion and trim unwated numbers and symbols

RegEx to convert URLs in text into clickable ones with custom anchor text [duplicate]

Categories

Resources