Extend PHP regex to cover "srcset" and "style" attributes

Extend PHP regex to cover "srcset" and "style" attributes - php

I've created a WordPress plugin that turn all links into protocol-relative URLs (removing http: and https:) based off the tags and attributes that I list in the $tag and $attribute variables. This is part of the function. To save space, the rest of the code can be found here.
$content_type = NULL;
# Check for 'Content-Type' headers only
foreach ( headers_list() as $header ) {
if ( strpos( strtolower( $header ), 'content-type:' ) === 0 ) {
$pieces = explode( ':', strtolower( $header ) );
$content_type = trim( $pieces[1] );
break;
}
}
# If the content-type is 'NULL' or 'text/html', apply rewrite
if ( is_null( $content_type ) || substr( $content_type, 0, 9 ) === 'text/html' ) {
$tag = 'a|base|div|form|iframe|img|link|meta|script|svg';
$attribute = 'action|content|data-project-file|href|src|srcset|style';
# If 'Protocol Relative URL' option is checked, only apply change to internal links
if ( $this->option == 1 ) {
# Remove protocol from home URL
$website = preg_replace( '/https?:\/\//', '', home_url() );
# Remove protocol from internal links
$links = preg_replace( '/(<(' . $tag . ')([^>]*)(' . $attribute . ')=["\'])https?:\/\/' . $website . '/i', '$1//' . $website, $links );
}
# Else, remove protocols from all links
else {
$links = preg_replace( '/(<(' . $tag . ')([^>]*)(' . $attribute . ')=["\'])https?:\/\//i', '$1//', $links );
}
}
# Return protocol relative links
return $links;
This works as intended, but it doesn't work on these examples:
<!-- Within the 'style' attribute -->
<div class="some-class" style='background-color:rgba(255,255,255,0);background-image:url("http://placehold.it/300x200");background-position:center center;background-repeat:no-repeat'>
<!-- Within the 'srcset' attribute -->
<img src="http://placehold.it/600x300" srcset="http://placehold.it/500 500x, http://placehold.it/100 100w">
However, the code partially works for these examples.
<div class="some-class" style='background-color:rgba(255,255,255,0);background-image:url("http://placehold.it/300x200");background-position:center center;background-repeat:no-repeat'>
<img src="http://placehold.it/600x300" srcset="//placehold.it/500 500x, http://placehold.it/100 100w">
I've played around with adding additional values to the $tag and $attribute variables, but that didn't help. I'd assume I need to update the rest of my regex to cover these two additional tags? Or is there is a different way to approach it, such as DOMDocument?

I was able to simplify the code by doing the following:
$content_type = NULL;
# Check for 'Content-Type' headers only
foreach ( headers_list() as $header ) {
if ( strpos( strtolower( $header ), 'content-type:' ) === 0 ) {
$pieces = explode( ':', strtolower( $header ) );
$content_type = trim( $pieces[1] );
break;
}
}
# If the content-type is 'NULL' or 'text/html', apply rewrite
if ( is_null( $content_type ) || substr( $content_type, 0, 9 ) === 'text/html' ) {
# Remove protocol from home URL
$website = $_SERVER['HTTP_HOST'];
$links = str_replace( 'https?://' . $website, '//' . $website, $links );
$links = preg_replace( '|https?://(.*?)|', '//$1', $links );
}
# Return protocol relative links
return $links;

Related

Wordpress: Automatically change URLs in the_content section

The solution from here isn't solving our problem.
I have already a solution to change all links in a field form in our theme. I am using different arrays for every network like $example_network_1 $example_network_2 with a PHP foreach for each affiliate network.
Now I need a solution to use this same arrays for the content of a WordPress post.
This solution is working for one network, but it caused a 404e Error for YouTube videos. We put the URL from a YouTube video and WordPress generates automatically an embedded video. With the following code we get a 404 error iframe or something like this.
We need a solution for more than one network.
I am very thankful for every help!
$example_network_1 = array(
array('shop'=>'shop1.com','id'=>'11111'),
array('shop'=>'shop2.co.uk','id'=>'11112'),
);
$example_network_2 = array(
array('shop'=>'shop-x1.com','id'=>'11413'),
array('shop'=>'shop-x2.net','id'=>'11212'),
);
add_filter( 'the_content', 'wpso_change_urls' ) ;
function wpso_found_urls( $matches ) {
global $example_network_1,$example_network_2;
foreach( $example_network_1 as $rule ) {
if (strpos($matches[0], $rule['shop']) !== false) {
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network-x.com/click/'. $rule['id'].'lorem_lorem='.rawurlencode($raw_url ) . '"';
}
/*
foreach( $example_network_2 as $rule ) {
if (strpos($matches[0], $rule['shop']) !== false) {
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network-y.com/click/'. $rule['id'].'lorem='.rawurlencode($raw_url ) . '"';
}
*/
}
}
function wpso_change_urls( $content ) {
global $example_network_1,example_network_2;
return preg_replace_callback( '/"+(http|https)(\:\/\/\S*'. $example_network_1 ['shop'] .'\S*")/', 'wpso_found_urls', $content ) ;
// return preg_replace_callback( '/"+(http|https)(\:\/\/\S*'. $example_network_2 ['shop'] .'\S*")/', 'wpso_found_urls', $content ) ;
}

autoembed is hooked at the_content with priority 8 on wp-includes/class-wp-embed.php:39
Try to lower the priority of the the_content filter so that the URL replacement happens before the embed, something like this:
add_filter( 'the_content', function ( $content ) {
/*
* Here, we define the replacements for each site in the network.
* '1' = main blog
* '2' = site 2 in the network, and so on
*
* To add more sites, just add the key number '3', etc
*/
$network_replacements = [
'1' => [
[ 'shop' => 'shop1.com', 'id' => '11111' ],
[ 'shop' => 'shop2.co.uk', 'id' => '11112' ],
],
'2' => [
[ 'shop' => 'shop-x1.com', 'id' => '11413' ],
[ 'shop' => 'shop-x2.net', 'id' => '11212' ],
]
];
// Early bail: Current blog ID does not have replacements defined
if ( ! array_key_exists( get_current_blog_id(), $network_replacements ) ) {
return $content;
}
$replacements = $network_replacements[ get_current_blog_id() ];
return preg_replace_callback( '/"+(http|https)(\:\/\/\S*' . $replacements['shop'] . '\S*")/', function( $matches ) use ( $replacements ) {
foreach ( $replacements as $rule ) {
if ( strpos( $matches[0], $rule['shop'] ) !== false ) {
$raw_url = trim( $matches[0], '"' );
return '"https://www.network-x.com/click/' . $rule['id'] . 'lorem_lorem=' . rawurlencode( $raw_url ) . '"';
}
}
}, $content );
}, 1, 1 );
This is not a copy and paste solution, but should get you going. You might need to tweak your "preg_replace_callback" code, but you said it was working so I just left it is it was.

If preventing the wp auto-embed works, then just add this line to your theme functions.php
remove_filter( 'the_content', array( $GLOBALS['wp_embed'], 'autoembed' ), 8 );

I wrote solution without test. Your code is hard to test without your site but I think that problem is with regex. Callback is hard to debugging. My version below.
First step, change your structure. I suspect that domains are unique. One dimensional array is more useful.
$domains = array(
'shop1.com'=>'11111',
'shop2.co.uk'=>'11112',
'shop-x1.com'=>'11413',
'shop-x2.net'=>'11212',
);
Next:
$dangerouschars = array(
'.'=>'\.',
);
function wpso_change_urls( $content ) {
global $domains,$dangerouschars;
foreach($domains as $domain=>$id){
$escapedDomain = str_replace(array_keys($dangerouschars),array_values($dangerouschars), $domain);
if (preg_match_all('/=\s*"(\s*https?:\/\/(www\.)?'.$escapedDomain.'[^"]+)\s*"\s+/mi', $content, $matches)){
// $matches[0] - ="https://example.com"
// $matches[1] - https://example.com
for($i = 0; $i<count($matches[0]); $i++){
$matchedUrl = $matches[1][$i];
$url = rawurlencode($matchedUrl);
//is very important to replace with ="..."
$content = str_replace($matches[0][$i], "=\"https://www.network-x.com/click/{$id}lorem_lorem={$url}\" ", $content);
}
}
}
return $content;
}
Example script

How to fix "Warning: preg_match() [function.preg-match]: Compilation failed: nothing to repeat at offset 1" in WordPress

When I installed WooCommerce on a WordPress page I got the chance to manage a little while ago, I started getting these errors whenever I go to a subpage:
Warning: preg_match() [function.preg-match]: Compilation failed: nothing to ?repeat at offset 1 in /var/www/watertours.dk/public_html/wp-includes/class-wp.php on line 222
Warning: preg_match() [function.preg-match]: Compilation failed: nothing to repeat at offset 1 in /var/www/watertours.dk/public_html/wp-includes/class-wp.php on line 223"
It even shows up in the dashboard occasionally.
I have found this guide which I have already tried several times:
step 0: if possible, backup your WP installation folder.
step 1: temporary disable all the plugins (important step)
step 2: in WordPress admin dashboard, go to Settings -> Permalinks
step 3: remember or note down somewhere what you have in the custom permalinks field: http://awesomescreenshot.com/0534epzk0c 96
step 4: temporary enable (switch to) the default permalink: http://awesomescreenshot.com/0f74epyi15 79 Click Save Changes button.
step 5: verify the website is working now (not everything, because the plugins are disabled, but the preg_match error should be gone)
step 6: switch back to the custom permalinks setting you had at step 3
step 7: enable back all the plugins
The error should be gone."
It works for a little while (two minutes or so) and then those two errors start popping up again.
I am thinking of just remaking the WordPress site from the ground up since it is quite a mess anyway. But if anyone has a solution, I would be more than grateful. :)
EDIT:
* Parse request to find correct WordPress query.
*
* Sets up the query variables based on the request. There are also many
* filters and actions that can be used to further manipulate the result.
*
* #since 2.0.0
*
* #global WP_Rewrite $wp_rewrite
*
* #param array|string $extra_query_vars Set the extra query variables.
*/
public function parse_request( $extra_query_vars = '' ) {
global $wp_rewrite;
/**
* Filters whether to parse the request.
*
* #since 3.5.0
*
* #param bool $bool Whether or not to parse the request. Default true.
* #param WP $this Current WordPress environment instance.
* #param array|string $extra_query_vars Extra passed query variables.
*/
if ( ! apply_filters( 'do_parse_request', true, $this, $extra_query_vars ) ) {
return;
}
$this->query_vars = array();
$post_type_query_vars = array();
if ( is_array( $extra_query_vars ) ) {
$this->extra_query_vars = & $extra_query_vars;
} elseif ( ! empty( $extra_query_vars ) ) {
parse_str( $extra_query_vars, $this->extra_query_vars );
}
// Process PATH_INFO, REQUEST_URI, and 404 for permalinks.
// Fetch the rewrite rules.
$rewrite = $wp_rewrite->wp_rewrite_rules();
if ( ! empty( $rewrite ) ) {
// If we match a rewrite rule, this will be cleared.
$error = '404';
$this->did_permalink = true;
$pathinfo = isset( $_SERVER['PATH_INFO'] ) ? $_SERVER['PATH_INFO'] : '';
list( $pathinfo ) = explode( '?', $pathinfo );
$pathinfo = str_replace( '%', '%25', $pathinfo );
list( $req_uri ) = explode( '?', $_SERVER['REQUEST_URI'] );
$self = $_SERVER['PHP_SELF'];
$home_path = trim( parse_url( home_url(), PHP_URL_PATH ), '/' );
$home_path_regex = sprintf( '|^%s|i', preg_quote( $home_path, '|' ) );
// Trim path info from the end and the leading home path from the
// front. For path info requests, this leaves us with the requesting
// filename, if any. For 404 requests, this leaves us with the
// requested permalink.
$req_uri = str_replace( $pathinfo, '', $req_uri );
$req_uri = trim( $req_uri, '/' );
$req_uri = preg_replace( $home_path_regex, '', $req_uri );
$req_uri = trim( $req_uri, '/' );
$pathinfo = trim( $pathinfo, '/' );
$pathinfo = preg_replace( $home_path_regex, '', $pathinfo );
$pathinfo = trim( $pathinfo, '/' );
$self = trim( $self, '/' );
$self = preg_replace( $home_path_regex, '', $self );
$self = trim( $self, '/' );
// The requested permalink is in $pathinfo for path info requests and
// $req_uri for other requests.
if ( ! empty( $pathinfo ) && ! preg_match( '|^.*' . $wp_rewrite->index . '$|', $pathinfo ) ) {
$requested_path = $pathinfo;
} else {
// If the request uri is the index, blank it out so that we don't try to match it against a rule.
if ( $req_uri == $wp_rewrite->index ) {
$req_uri = '';
}
$requested_path = $req_uri;
}
$requested_file = $req_uri;
$this->request = $requested_path;
// Look for matches.
$request_match = $requested_path;
if ( empty( $request_match ) ) {
// An empty request could only match against ^$ regex
if ( isset( $rewrite['$'] ) ) {
$this->matched_rule = '$';
$query = $rewrite['$'];
$matches = array( '' );
}
} else {
foreach ( (array) $rewrite as $match => $query ) {
// If the requested file is the anchor of the match, prepend it to the path info.
if ( ! empty( $requested_file ) && strpos( $match, $requested_file ) === 0 && $requested_file != $requested_path ) {
$request_match = $requested_file . '/' . $requested_path;
}
if ( preg_match( "#^$match#", $request_match, $matches ) || // Line 222
preg_match( "#^$match#", urldecode( $request_match ), $matches ) ) { // Line 223
if ( $wp_rewrite->use_verbose_page_rules && preg_match( '/pagename=\$matches\[([0-9]+)\]/', $query, $varmatch ) ) {
// This is a verbose page match, let's check to be sure about it.
$page = get_page_by_path( $matches[ $varmatch[1] ] );
if ( ! $page ) {
continue;
}
$post_status_obj = get_post_status_object( $page->post_status );
if ( ! $post_status_obj->public && ! $post_status_obj->protected
&& ! $post_status_obj->private && $post_status_obj->exclude_from_search ) {
continue;
}
}
// Got a match.
$this->matched_rule = $match;
break;
}
}
}
if ( isset( $this->matched_rule ) ) {
// Trim the query of everything up to the '?'.
$query = preg_replace( '!^.+\?!', '', $query );
// Substitute the substring matches into the query.
$query = addslashes( WP_MatchesMapRegex::apply( $query, $matches ) );
$this->matched_query = $query;
// Parse the query.
parse_str( $query, $perma_query_vars );
// If we're processing a 404 request, clear the error var since we found something.
if ( '404' == $error ) {
unset( $error, $_GET['error'] );
}
}
// If req_uri is empty or if it is a request for ourself, unset error.
if ( empty( $requested_path ) || $requested_file == $self || strpos( $_SERVER['PHP_SELF'], 'wp-admin/' ) !== false ) {
unset( $error, $_GET['error'] );
if ( isset( $perma_query_vars ) && strpos( $_SERVER['PHP_SELF'], 'wp-admin/' ) !== false ) {
unset( $perma_query_vars );
}
$this->did_permalink = false;
}
}```

Cannot find where a redirect is specified

I've recently become the maintainer of a Wordpress site (I'm completely new to wordpress) and I'm having some difficulty determining where a redirect is specified.
I've checked the .htaccess file, and there's nothing specified in there. As far as I can tell, the rewrite rules aren't the cause.
I've tried deleting the page being redirected from and re-creating it, and the redirect still occurs.
My question is: where can you specify a redirect? I've run out of ideas of where to look.

one of my client want to custom url like
https://www.qsleap.com/gmat/resources as you know in wordpress evry request is catch by index.php . request filter catches the request and
call the page .
Read this code it may give you any idea.
function permalinks_customizer_request_before($query ){
$uri=$_SERVER['REQUEST_URI'];
$match= preg_match('/(gmat|gre|sat|lsat|cat)(\/resources\/tags\/)
(.*)\/(articles|videos|concept-notes|qna)/', $uri,$matches);
//$match=
preg_match('/(gmat|gre|sat|lsat|cat)/\resources/\tags/stanford-
gsb/\articles|videos|concept-notes)/?$', $uri,$matches);
if($match){
$url = parse_url( get_bloginfo( 'url' ) );
$url = isset( $url['path']) ? $url['path'] : '';
$request = ltrim( substr( $_SERVER['REQUEST_URI'], strlen( $url ) ), '/' );
$request = ( ( $pos = strpos( $request, '?' ) ) ? substr( $request, 0, $pos ) : $request );
if ( ! $request )
return $query;
$original_url="?page_name=tags&exam=".$matches[1]."&post_tag=".$matches[3]."&post_type=".$matches[4];
if ( $original_url !== null ) {
$original_url = str_replace('//', '/', $original_url);
if ( ( $pos = strpos( $_SERVER['REQUEST_URI'], '?' ) ) !== false ) {
$queryVars = substr( $_SERVER['REQUEST_URI'], $pos + 1 );
$original_url .= ( strpos( $original_url, '?' ) === false ? '?' : '&') . $queryVars;
}
$oldRequestUri = $_SERVER['REQUEST_URI'];
$oldQueryString = $_SERVER['QUERY_STRING'];
$_SERVER['REQUEST_URI'] = '/' . ltrim( $original_url, '/' );
$_SERVER['QUERY_STRING'] = ( ( $pos = strpos( $original_url, '?' ) ) !== false ? substr( $original_url, $pos + 1 ) : '' );
parse_str( $_SERVER['QUERY_STRING'], $queryArray );
$oldValues = array();
global $wp;
$wp->parse_request();
$query = $wp->query_vars;
if ( is_array( $queryArray ) ) {
foreach ( $queryArray as $key => $value ) {
$oldValues[$key] = $_REQUEST[$key];
$_REQUEST[$key] = $_GET[$key] = $value;
$query[$key]=$value;
}
}
$_SERVER['REQUEST_URI'] ='';
$_SERVER['QUERY_STRING']='';
}
}
return $query;
}
add_filter( 'request','permalinks_customizer_request_before',0);
function wp_url_rewrite_templates() {
if (get_query_var( 'page_name' ) && get_query_var( 'page_name'
)=='tags') {
add_filter( 'template_include', function() {
$template= dirname( __FILE__ ) . '/page-tags.php';
return $template;
});
}
}
add_action( 'template_redirect', 'wp_url_rewrite_templates' ,4 );

I think the easiest way for you to remove the redirects will be with this plugin.
https://redirection.me/
After you install it and activate it. From the Wordpress Admin
Tools > Redirection
You'll see a list of redirects, and add/remove any.

Prevent recursion on replacing elements

at first: No, this is not a duplicate. I know that there are some possibilities to search for elements in a HTML-page, but this is not really my problem.
I will outline my problem:
My PHP-code is for reasons I can not change called 2-3 times on every page-rendering.
My code crawls the html-content for specific words and replaces them with a link.
To archive this I am using https://github.com/sunra/php-simple-html-dom-parser .
This is my source:
foreach ( $dom->find( 'text' ) as $element ) {
//$config['exclusions'] is an array like ['a', 'img']
if ( !in_array( $element->parent()->tag, $config[ 'exclusions' ] ) ) {
foreach ( $markers as $marker ) {
$text = $marker[ 'text' ];
$url = $marker[ 'url' ];
$tip = strip_tags( $marker[ 'excerpt' ] );
$tooltip = ( $tooltip ? "data-uk-tooltip title='$tip'" : "" );
$tmpval = "tmpval-$i";
$element->innertext = preg_replace(
'/\b' . preg_quote( $text, "/" ) . '\b/i',
"<a href='$url' $hrefclass target='$target' $tmpval>\$0</a>",
$element->innertext,
1
);
$element->innertext = str_replace( $tmpval, $tooltip, $element->innertext );
$i++;
}
}
}
The problem is: If the $tooltip contains a word that matches a marker, this word is being replaced. So the result is <a href='foo.html' target='_self' data-uk-tooltip title='<a href='bar.html'...'>\$0</a> which destroys the markup of the page.
So my question: How can I prevent this?

Use lookbehind:
$element->innertext = preg_replace(
'(?<!\w=['"])\b' . preg_quote( $text, "/" ) . '\b/ig',
"<a href='$url' $hrefclass target='$target' $tmpval>\$0</a>",
$element->innertext,
1
);

Cut a string/url to always get a final string/url with a specific data and it's value in php

I have an url that contain the word "&key".
The "&key" word can be at the beginning or at the end of our url.
Ex1= http://xxxxx.com?c1=xxx&c2=xxx&c3=xxx&key=xxx&c4=xxx&f1=xxx
Ex2= http://xxxxx.com?c1=xxx&key=xxx&c2=xxx&c3=xxx&c4=xxx&f1=xxx
What I would like to get is all the time the url with the Key element and it's value.
R1: http://xxxxx.com?c1=xxx&c2=xxx&c3=xxx&key=xxx
R2: http://xxxxx.com?c1=xxx&key=xxx
Here is what I have done:
$lp_sp_ad_publisher = "http://xxxxx.com?c1=xxx&c2=xxx&c3=xxx&key=xxxc4=xxxf1=xxx";
$lp_sp_ad_publisher_cut_link = explode("&", $lp_sp_ad_publisher_cut[1]); // tab
$lp_sp_ad_publisher_cut_link_final = $lp_sp_ad_publisher_cut_link[0]; // http://xxxxx.com?c1=xxx
$counter = 1;
// finding &key inside $lp_sp_ad_publisher_cut_link_final
while ((strpos($lp_sp_ad_publisher_cut_link_final, '&key')) !== false);
{
$lp_sp_ad_publisher_cut_link_final .= $lp_sp_ad_publisher_cut_link[$counter];
echo 'counter: ' . $counter . ' link: ' . $lp_sp_ad_publisher_cut_link_final . '<br/>';
$counter++;
}
I'm only looping once all the time. I guess the while loop isn't refreshing with the inside new value. Any solution?

EDIT: Sorry, I misunderstood the question.
This is tricky because the url key and value can be anything, so it might be safer to breakdown the URL using a combination of parse_url() and parse_str(), then put the url back together leaving off the part you don't want. Something like this:
function cut_url( $url='', $key='' )
{
$output = '';
$parts = parse_url( $url );
$query = array();
if( isset( $parts['scheme'] ) )
{
$output .= $parts['scheme'].'://';
}
if( isset( $parts['host'] ) )
{
$output .= $parts['host'];
}
if( isset( $parts['path'] ) )
{
$output .= $parts['path'];
}
if( isset( $parts['query'] ) )
{
$output .= '?';
parse_str( $parts['query'], $query );
}
foreach( $query as $qkey => $qvalue )
{
$output .= $qkey.'='.$qvalue.'&';
if( $qkey == $key ) break;
}
return rtrim( $output, '&' );
}
Usage:
$input = 'https://www.xxxxx.com/test/path/index.php?c1=xxx&c2=xxx&key=xxx&c3=xxx&c4=xxx&f1=xxx';
$output = cut_url( $input, 'key' );
Output:
https://www.xxxxx.com/test/path/index.php?c1=xxx&c2=xxx&key=xxx

If the intention is to always ensure that the parameter key and it's associated value appear at the end of the string, how about something like:
$tmp=array();$key='';
$parts=explode( '&', parse_url( $_SERVER['REQUEST_URI'], PHP_URL_QUERY ) );
foreach( $parts as $pair ) {
list( $param,$value )=explode( '=',$pair );
if( $param=='key' )$key=$pair;
else $tmp[]=$pair;
}
$query = implode( '&', array( implode( '&', $tmp ), $key ) );
echo $query;
or,
parse_str( $_SERVER['QUERY_STRING'], $pieces );
foreach( $pieces as $param => $value ){
if( $param=='key' ) $key=$param.'='.$value;
else $tmp[]=$param.'='.$value;
}
$query = implode( '&', array( implode( '&', $tmp ), $key ) );
update
I'm puzzled that you were "not getting the good result"!
consider the url:
https://localhost/index.php?sort=0&dir=false&tax=23&cost=99&aardvark=creepy&key=banana&tree=large&ac=dc&limit=1000#569f945674935
The above would output:
sort=0&dir=false&tax=23&cost=99&aardvark=creepy&tree=large&ac=dc&limit=1000&key=banana
so the key=banana gets placed last using either method above.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extend PHP regex to cover "srcset" and "style" attributes - php

Related

Wordpress: Automatically change URLs in the_content section

How to fix "Warning: preg_match() [function.preg-match]: Compilation failed: nothing to repeat at offset 1" in WordPress

Cannot find where a redirect is specified

Prevent recursion on replacing elements

Cut a string/url to always get a final string/url with a specific data and it's value in php

Categories

Resources