basic if/then with PHP - php

Okay so i set up this thing so that I can print out page that people came from, and then put dummy tags on certain pages. Some pages have commented out "linkto" tags with text in between them.
My problem is that some of my pages don't have "linkto" text. When I link to this page from there I want it to grab everything between "title" and "/title". How can I change the eregi so that if it turns up empty, it should then grab the title?
Here is what I have so far, I know I just need some kind of if/then but I'm a rank beginner. Thank you in advance for any help:
<?php
$filesource = $_SERVER['HTTP_REFERER'];
$a = fopen($filesource,"r"); //fopen("html_file.html","r");
$string = fread($a,1024);
?>
<?php
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
$outdata = $out[1];
}
//echo $outdata;
$outdatapart = explode( " " , $outdata);
echo $part[0];
?>

Here you go: if eregi() fails to match, the $outdata assignment will never happen as the if block will not be executed. If it matches, but there's nothing between the tags, $outdata will be assigned an empty string. In both cases, !$outdata will be true, so we can fallback to a second match on the title tag instead.
if(eregi("<linkto>(.*?)</linkto>", $string, $link_match)) {
$outdata = $link_match[1];
}
if(!$outdata && eregi("<title>(.*?)</title>", $string, $title_match)) {
$outdata = $title_match[1];
}
I also changed the (.*) in the match to (.*?). This means, don't be greedy. In the (.*) form, if you had $string set to
<title>Page Title</title> ...
... <iframe><title>A second title tag!</title></iframe>
The regex would match
Page Title</title> ... ... <iframe><title>A second title tag!
Because it tries to match as much as possible, as long as the text is between any and any other !. In the (.*?) form, the match does what you'd expect - it matches
Page Title
And stops as soon as it is able.
...
As an aside, this thing is an interesting scheme, but why do you need it? Pages can link to other pages and pass parameters via the query string:
...
Then somescript.php can access the prevpage parameter via the $_GET['prevpage'] superglobal variable.
Would that solve your problem?

The POSIX regex extension (ereg etc.) will be deprecated as of PHP 5.3.0 and may be gone completely come PHP 6, you're better off using the PCRE functions (preg_match and friends).
The PCRE functions are also faster, binary safe and support more features like non-greedy matching etc.
Just a pointer.

you need if, else.
if(eregi(...))
{
.
.
.
}
else
{
just grab title;
}
perhaps you should have done a quick google search to find this very simple answer.

Just add another if test before you assign the match to $outdata:
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
if ($out[1] != "") {
$outdata = $out[1];
} else {
// Look in the title.
}
}

Related

Search a Specific Word or a Sentence on a Web Page (https) using php

What would be the right php code to search for a specific word in a specific https URL and if it exists, return a message?
For example...
URL: https://www.example.com/ ,
Search for word: "illustrative" ,
Return: "Found"
I've seen some questions about this, but couldn't find the exact answer to my question, I'd be glad if anyone could help.
Thx
Maybe this can be a good starting point:
function page_contains($link, $word){
return strpos(file_get_contents($link), $word) ? 'Found' : 'Not found';
}
page_contains('https://www.example.com/', 'illustrative');
Searching for a word inside HTML is not as trivial as a needle-haystack search, unfortunately. You have to account for the fact that the HTML itself may contain your search word/phrase thus providing a false positive.
For example, consider the following HTML document:
<html>
<head>
<meta content="illustrative productions">
<script>
var illustrative = true;
</script>
<style>
.illustrative {
background-color: #fff;
}
</style>
</head>
<body>
<h1 class="illustrative">Hello World</h1>
<p>Your search word never appears here.</p>
</body>
</html>
If we did a simple strpos() search on this document we'd get a false positive even though the word we're searching for would never actually show up on the rendered page in a browser.
So the first problem is that we'd have to parse the HTML document first and extract only the text nodes in the document to search. This can be achieved simply with DOMDocument like so...
function findWord(String $url, String $searchWord): Bool {
$html = file_get_contents($url);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$htmlContent = $dom->getElementsByTagName('body')->item(0);
$text = $htmlContent->textContent;
return strpos($text, $searchWord) !== false;
}
// First let's start with a URL we know the word exists
$url = "https://www.merriam-webster.com/dictionary/illustrative";
// Gives us "Found"
if (findWord($url, "illustrative")) {
echo "Found";
} else {
echo "Not Found";
}
// Now let's try a URL we know the word doesn't exist
$url = "https://php.net/";
// Gives us "Not Found"
if (findWord($url, "illustrative")) {
echo "Found";
} else {
echo "Not Found";
}
Keep in mind this solution will also find partial matches, so if you were searching for words like "pan" and the word "pancake" is found inside the document it will still trigger. This also doesn't account for things like lemmatisation where you search for the root of a word such that you can find all of its inflections; such as when you search for the word "illustrative" the search also returns results for "illustration", "illustrative", and "illustrate" with all of its pluralization. This is a technique common in search engines, for example, because of the fact that search words and phrases may appear in a document in many inflected forms. Indexing all possible inflections may be too costly and storing a dictionary of all such inflections would also be too costly. Thus words are stemmed or lemmatized to perform more accurate searches. Finally, this particular search is case-sensitive so if you intend on case-insensitive search you should use the case-insensitive form stripos() instead of strpos().

How to grab a word from a page

I was wondering how to grab some text from an external page using PHP.
I think that preg_match() can help but I can't get how to use it.
The text in the page is the following:
dragontail-5.7.2.tgz
and I need to grab only
5.7.2
Thank you for helping.
Check this out:
https://regex101.com/r/cF8mS1/1
/([0-9.]+)/gm
Means "Select all the integer characters since they are more than 1, and include the "." as well, and give me all of them, on multiline too. Thank you."
The last thing to do is to delete the last or first character ".", so:
if (preg_match('/([0-9.]+)/gm', $input, $matches)) {
$result = trim($matches[1], '.');
} else {
$result = null;
}

Create URL with only A-Z characters that includes variable and extension

I am trying to create file links based a variable which has a "prefix" and an extension at the end.
Here's what I have:
$url = "http://www.example.com/mods/" . ereg("^[A-Za-z_\-]+$", $title) . ".php";
Example output of what I wish to have outputted (assuming $title = testing;):
http://www.example.com/mods/testing.php
What it currently outputs:
http://www.example.com/mods/.php
Thanks in advance!
Perhaps this is what you need:
$title = "testing";
if(preg_match("/^[A-Za-z_\-]+$/", $title, $match)){
$url = "http://www.example.com/mods/".$match[0].".php";
}
else{
// Think of something to do here...
}
Now $url is http://www.example.com/mods/testing.php.
Do you want to keep letters and remove all other chars in the URL?
In this case the following should work:
$title = ...
$fixedtitle=preg_replace("/[^A-Za-z_-]/", "", $title);
$url = "http://www.example.com/mods/".$fixedtitle.".php";
the inverted character class will remove everything you do not want.
OK first it's important for you to realize that ereg() is deprecated and will eventually not be available as a command for php, so to prevent an error down the road you should use preg_match instead.
Secondly, both ereg() and preg_match output the status of the match, not the match itself. So
ereg("^[A-Za-z_\-]+$", $title)
will output an integer equal to the length of the string in $title, 0 if there's no match and 1 if there's a match but you didn't pass it another variable to store the matches in.
I'm not sure why it's displaying
http://www.example.com/mods/.php
It should actually be outputting
http://www.example.com/mods/1.php
if everything was working correctly. So there is something going on there, and it's definitely not doing what you want it to. You need to pass another variable to the function that will store all the matches found. If the match is successful (which you can check using the return value of the function) then that variable will be an array of all matches.
Note that with preg_match by default only the first match will be returned. but it will still generate an array (which can be used to get isolated portions of the match) whereas preg_match_all will match multiple things.
See http://www.php.net/manual/en/function.preg-match.php for more details.
Your regex looks more or less correct
So the proper code should look something like:
$title = 'testing'; //making sure that $title is what we think it is
if (preg_match('/^[A-Za-z_\-]+$/',$title,$matches)) {
$url = "http://www.example.com/mods/" . $matches[0] . ".php";
} else {
//match failed, put error code in here
}

Regular Expression {POST:name}

I know a bit about Regular Expression but really want to learn more about it and now i'm trying to make a function that detects all {} in my content (from a database) and checks what between the brackets. If there is a POST or GET with a name (format: POST:name or GET:name} i would like to replace them with that value.
Example:
When i have a form with the following inputs:
Name
Email
Message
And then in the value attribute i type: {POST:Name}
Then the script must detect the {POST:Name} and will replace it with the string in $_POST['name']. I already searched on Google, but found too much that i don't know what to really use.
Now i have:
<?php
preg_match_all("/{(POST|GET):[.*](})/", $content, $matches, PREG_SET_ORDER);
foreach($matches AS $match)
{
if(isset($_POST[$match]))
$content = str_replace('{POST:'.$match, $_POST[$match], $content);
else
$content = str_replace('{GET:'.$match, $_GET[$match], $content);
}
?>
But this don't work.
You should use preg_replace, better than str_replace.
And if you use preg_replace, you don't need no more your first condition, et can do the same code with only one instruction.
http://fr2.php.net/preg_replace
<?php preg_replace('#{(POST|GET):(.*)}#','$_$1[$2]',$content); ?>
My regex can be false, but something like this should work.

Reading php files with special tags in php

I have a file which reads as follows
<<row>> 1|test|20110404<</row>>
<<row>> 1|test|20110404<</row>>
<<row>><</row>> indicates start and end of line.I want to read line between this tags and also check whether this tags are present.
The first thing you need to do is locate the position of this "tag". The strpos() function does just that.
$tag_pos=strpos('<> 1|test|20110404<> <> 1|test|20110404<>', '<>');
if ($tag_pos===false) {
//The tag was not found!
} else {
//$tag_pos equals the numeric position of the first character of your tag
}
If these are truly lines, an efficient way to get them all is just to split on <>.
$lines=explode('<>', '<> 1|test|20110404<> <> 1|test|20110404<>');
$lines=array_filter($lines); //Removes blank strings from array
You could improve this by adding a callback function to the array_filter() call that uses trim() to remove any whitespace and then see if it is blank or not.
Edit: Great, I see that your "tags" were missing from your post. Since your start and end tags do not match, the code above will be of little use to you. Let me try again...
function strbetweenstrs($source, $tag1, $tag2, $casesensitive=true) {
$whatsleft=$source;
while ($whatsleft<>'') {
if ($casesensitive) {
$pos1=strpos($whatsleft, $str1);
$pos2=strpos($whatsleft, $str2, $pos1+strlen($str1));
} else {
$pos1=strpos(strtoupper($whatsleft), strtoupper($str1));
$pos2=strpos(strtoupper($whatsleft), strtoupper($str2), $pos1+strlen($str1));
}
if (($pos1===false) || ($pos2===false)) {
break;
}
array_push($results, substr($whatsleft, $pos1+strlen($str1), $pos2-($pos1_strlen($str1))));
$whatsleft=substr($whatsleft, $pos2+strlen($str2));
}
}
Note that I haven't tested this... but you get the generally idea. There is probably a much more efficient way to go about doing it.
Creating your own format is not so hard, but creating a script to read it can be difficult.
The advantage of using standardized formats is that most programming languages has support for them already. For example:
XML: You can use the simplexml_load_string() function and it can make you navigate easily through your content.
$str = "<?xml version="1.0" encoding="utf-8"?>
<data>
<row>1|test|20110404</row>
<row>1|test|20110404</row>
</data>";
$xml = simplexml_load_string($str);
Now you can access your data
echo $xml->row[0];
echo $xml->row[1];
i'm sure you get the idea,
there is also a very good support for JSON (Javascript Object Notation) using the jsondecode() function;
Check it on php.net for more details
i would suggest to use preg_match :-
preg_match( '#<< row>>(.*)<< /row>>#', $line, $matches);
if( ! empty($matches))
{
// line was found
print_r( $matches[1] ); // will contain the content between the start and end row tags
}

Categories