PHP: Check URL against array of badwords, and output any found

PHP: Check URL against array of badwords, and output any found - php

I have a database which contains the results of logged network traffic, and I'm building a simple log viewer frontend in PHP. I need to check each URL and see if it contains any of the 'bad words' in an external 'badwords.txt' file, then echo which ones, and color the table row red (the easy bit!)
Thus far I have loaded the badwords.txt file into an array, and fetched the URL from the database. Here is the portion of my code where I am trying to get a positively identified 'bad' url. I ideally want to output which badwords were found, but have simplified everything to just try and get it to work for now.
// Load the badwords file into an array
$words = file('badwords.txt');
//$row[3] is the URL fetched from the database
$testURL = $row[3];
foreach ($words as $phrase) {
if (strrpos($testURL, $phrase)) {
echo "FOUND";
}
}
This is not working for me, and never outputs FOUND, even when the url definitely contains a bad word. I have checked that my $words array is populated correctly with all the badwords, and I have checked that the $testURL is not empty etc.
Can anyone help please? :) I'd be really grateful for any assistance - I have read through so many StackExchange posts on similar topics, but none seem to work for my case.
Thank you!

Words array contains newline symbol, so strpos never works. Remove them
$words = file('badwords.txt', FILE_IGNORE_NEW_LINES);

Since strrpos() returns numeric position of needle, and boolean false, so code should be
<?php
// Load the badwords file into an array
$words = file('words.txt', FILE_IGNORE_NEW_LINES);
//$row[3] is the URL fetched from the database
$testURL = row[3];
foreach ($words as $phrase) {
if (strrpos($testURL, $phrase)>-1) {
echo "FOUND";
}
}

Related

Want to display few words before and end of search keyword

For example I have a string in database table
"This is a test and I want to test everything on my website using
testing tool. I need help of my people to search the new things on
portal but I can not find any source of help"
This data is in my mysql table
No when I search a word or string in this content then I want to display first 20 and next 20 words with Search string
For example if I search my people
Then I should get result as following
website using testing tool. I need help of my people to search the new things
Or if I search portal then it should give result as following
I need help of my people to search the new things on portal but I can not find any source of help
I tried using mysql like query but it show full content.

You can use mysql LIKE statement and then use php explode function to get the single sentence. Do a strpos checking on the exploded pieces to see whether you are taking the correct result.
$keyword = "portal";
$result_str = "This is a test and i want to test everything on my website using testing tool. I need help of my people to search the new things on portal but i can not find any source of help";
$suggestions = explode(".", $result_str);
$match = "";
foreach ($suggestions as $key => $value) {
if(strpos($value,$keyword) !==false)
$match = $value;
continue;
}
echo $match;

PHP website data mining Preg_Match Undefined Offset

I'm working on a PHP project for school. The task is to build a website to grab and analyze data from another website. I have the framework set up, and I am able to grab certain data from the desired site, but I can't seem to get the syntax right for other data that I need to obtain.
For example, the site that I am currently analyzing is a page for a specific item returned from a search of Amazon.com (e.g. search amazon.com for "iPad" and pick the first result). I am able to grab the title of the product's page, but I need to grab the review count and the price, and therein lies the issue. I'm using preg_match to get the title (works fine), but I'm not able to get the reviews nor the price. I continue to get the Undefined Offset error, which I've discovered means that there is nothing being returned that matches the given criterion. Simply checking to see whether something has been returned will not help me, since I need to obtain these data for my analysis. The 's that I'm trying to mine are unique on the page, so there is only one instance of each.
The Page Source for my product page contains the following snippits of HTML that I need to grab. (The website can, and needs to be able to handle, anything, but for this example, I searched "iPad").
<span id="priceblock_ourprice" class="a-size-medium a-color-price">$397.74</span>
I need the 397.74.
<span id="acrCustomerReviewText" class="a-size-base">1,752 customer reviews</span>
I need the 1,752.
I've tried all combinations of escape characters, wildcards, etc., but I can't seem to get beyond the Undefined Offset error. An example of my code is as follows where $link is the URL, and $f is an empty array in which I want to store the result (Note: There is NOT a space after the '<' in "< span..." It just erased everything up to the "...(.*)..." when I typed it as "< span..." without the space):
preg_match("#\< span id\=\"priceblock\_ourprice\" class\=\"a\-size\-medium a\-color\-price\"\>(.*)\<\/span\>#", file_get_contents($link), $f);
$price=$f[1]; //Offset error occurs on this line
echo $price;
Please help. I've been beating my head against this for the past two days now. I'm hoping I'm just doing something stupid. This is my first experience with preg_match and data mining. Thank you much in advanced for your time and assistance.

Code
As stated by #cabellicar123, you shouldn't use regex with html.
I believe what you are looking for is strpos() and substr(). It should look something like this:
function get_content($string, $begintag, $endtag) {
if (strpos($string, $begintag) !== False) {
$location = strpos($string, $begintag) + strlen($begintag);
$leftover = substr($string, $location);
$contents = substr($leftover, 0, strpos($leftover, $endtag));
return $contents;
}
}
// Usage (Change the variables):
$str = file_get_contents('http://www.amazon.com/OLB3-Official-League-Recreational-Ball/dp/B004KOBRMC/');
$beg = '<b class="priceLarge">$';
$end = '</b>';
get_content($str, $beg, $end);
I've provided a working example which would return the price of the object on the page, in this case, the price of a rawlings baseball.
Explanation
I'll go through the code, line by line, and explain every piece.
function get_content($string, $begintag, $endtag)
$string is the string being searched through (in this case an amazon page), $begintag is the opening tag of the element being searched for, and $closetag is the closing tag of that element. NOTE: This will only use the first instance of the opening tag, more than that will be ignored.
if (strpos($string, $begintag) !== False)
Checks if the beginning tag actually exists. Note the !== False; that's because strpos can return 0, which evaluates to False.
$location = strpos($string, $begintag) + strlen($begintag);
strpos() will return the first instance of $begintag in $string, therefore the length of the $begintag must be added to the strpos() to get the location of the end of $begintag.
$leftover = substr($string, $location);
Now that we have the $location of the opening tag, we need to narrow the $string down by setting $leftover to the part of the $string after $location.
$contents = substr($leftover, 0, strpos($leftover, $endtag));
This gets the position of the $endtag in $leftover, and stores everything before that $endtag in $contents.
As for the last few lines of code, they are specific to this example and just need to be changed to fit the circumstances.

array_search/in_array can't find string

I have the following code
$checkarray = unserialize(file_get_contents('serialized.txt'));
var_dump($checkarray);
foreach($checkarray as $_index => $_image )
{
echo strval($_index)." = ".var_dump($_image)."<br>";
}
var_dump(array_search('pirates_of_love_and_kingdoms.jpg',$checkarray));
var_dump(in_array('pirates_of_love_and_kingdoms.jpg',$checkarray));
the contents of 'serialized.txt' can be found here (http://textuploader.com link, not a download link, will need to copy and paste into new file if you want to use it)
the first var_dump output the array however because i have xdebug not the entire array is outputted, i'm not looking to get this fixed, it just confirms that the file was imported and unserialize correctly, the loop outputs everything in the array confirming that every value is a string (thanks to xdebug), the final 2 var_dumps are to output the results of the functions.
when i run my code, both var_dumps output false, however if i use the browser to search for the text i do find it so i know it's in the array.
I know that array_search returns the key in the array if the needle is found while in_array returns true if the needle is found and both will return false if the needle can't be found, however i do not get how neither can find it when i can confirm it's outputted in the loop and in my serialized.txt file at the same index as what is specified in the file.
i have already checked the basics, white spaces, new lines, casing in both what is outputted on the screen and in the file, can anyone explain to me what i have done wrong?

The line breaks at the end of each line are causing problems for you. Do a trim inside your foreach:
$checkarray = unserialize(file_get_contents('serialized.txt'));
var_dump($checkarray);
foreach($checkarray as $_index => $_image )
{
$checkarray[$_index] = trim($_image);
echo strval($_index)." = ".var_dump($_image)."<br>";
}
var_dump(array_search('pirates_of_love_and_kingdoms.jpg',$checkarray));
var_dump(in_array('pirates_of_love_and_kingdoms.jpg',$checkarray));

Need to pull a list of image url's from a Mysql database based on a subdomain, and store them as a list

I have a Mysql Forum database which I need to query all the posts in for a specific set of images, on a particular url.
The url's are of images hosted on a subdomain that we don't have access to like this "http://images.website.com/images/randomnumberhere.jpg".
I need a mysql query to pull these out and process them into a list which we can later loop through to grab them all and move them.(I got this part handled)
I'm a php/mysql programmer but this feels like a regex problem and i'm not so great with that yet.
The issue is we don't have a list of the images, and it's a big long random number (so far as I can see). So what I need is a string like "images.website.com/images/(randomnumbers).jpg" and then put them into a list.

You could get all fancy pants-like and use regular expressions,
but you could also try a simple
SELECT * FROM image_table WHERE image_source LIKE '%images.website.com/images/%'
Is this what you are looking for?

If you're looking to pull all of the text from the database, and then use PHP to create a list of the images, try something like this:
$image_list = array();
while($row = $sql->fetch_array())
{
$text = $row['text'];
/* Changed to preg_match_all */
if(preg_match_all("/http:\/\/images.website.com\/images\/[0-9]+\.(jpg|jpeg|png|gif)/i", $text, $matches))
{
$image_list[] = $matches[0];
}
}
Nothing fancy, and I didn't test it, but it should work. That's a hardcoded regex that matches the URL you're looking for specifically. You may want to modify it so that it can match multiple domains from an array, or something, but it should get you started.
EDIT: Should have mentioned that you could then loop through the $image_list array to display the images, or whatever you're going to do with them.

$rand = //rand() function or whatever you would like to create the random number, since you didn't give ranges, it might be an array aswell;
$string = "images.website.com/images/$rand.jpg";
Or even a simple loop:
for ($i = 1; $i < 100; $i++) {
echo $string = "images.website.com/images/$i.jpg";
}
It's up to you how would you use it with your database

Searching for a link in a website and displaying it PHP

hello im a newbie in php i am trying make a search function using php but only inside the website without any database
basically if i want to search a string namely "Health" it would display the lines
The Joys of Health
Healthy Diets
This snippet is the only thing i could find if properly coded would output the "lines" i want
$myPage = array("directory.php","pages.php");
$lines = file($myPage[n]);
echo $lines[n];
i havent tried it yet if it would work but before i do i want to ask if there is any better way to do this?
if my files have too many lines wont it stress out the server?

The file() function will return an array. You should use file_get_contents() instead, as it returns a string.
Then, use regular expressions to find specific text within a link.

Your goal is fine but the method you're thinking about is not. the file() function read a file, line by line, and inserts it into an array. This assumes the HTML is well-structured in a human-readable fashion, which is not always the case. However, if you're the one providing the HTML and you make sure the structure is perfectly defined, ok... here you have the example you provided us with but complete (take into account it's the 'wrong' way of solving your problem, but if you want to follow that pattern, it's ok):
function pagesearch($pages, $string) {
if (!empty($pages) && !empty($string)) {
$tags = [];
foreach ($pages as $page) {
if ($lines = file($page)) {
foreach ($lines as $line) {
if (!empty($line)) {
if (mb_strpos($line, $string)) {
$tags[$page][] = $line;
}
}
}
}
}
return $tags;
}
}
This will return you an array with all the pages you referenced with all occurrences of the word you look for, separated by page. As I said, it's not the way you want to solve this, but it's a way.
Hope that helps

Because you do not want to use any database and because the term database is very broad and includes the file-system you want to do a search in some database without having a database.
That makes no sense. In your case one database at least is the file-system. If you can accept the fact that you want to search a database (here your html files) but you do not want to use a database to store anything related to the search (e.g. some index or cached results), then what you suggest is basically how it is working: A real-time, text-based, line-by-line file-search.
Sure it is very rudimentary but as your constraint is "no database", you have already found the only possible way. And yes it will stress your server when used because real-time search is expensive.
Otherwise normally Lucene/Solr is used for the job but that is a database and a server even.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: Check URL against array of badwords, and output any found - php

Words array contains newline symbol, so strpos never works. Remove them $words = file('badwords.txt', FILE_IGNORE_NEW_LINES);

Related

Want to display few words before and end of search keyword

PHP website data mining Preg_Match Undefined Offset

array_search/in_array can't find string

Need to pull a list of image url's from a Mysql database based on a subdomain, and store them as a list

Searching for a link in a website and displaying it PHP

Categories

Resources