Need to clean database of spam

Need to clean database of spam - php

So a couple things. First, being sick, I can't seem to focus right to get this figured out like I should, and secondly, it's flat out got me stumped on how to deal with this.
So I have a client who has an old site built on old code. There were some extreme vulnerabilities in the code that allowed for injections and attacks - which happened. Since I've come onto the project, I have tightened things up considerably and haven't really had issues. But I just found something that appears to be a lingering issue from previous hacks.
So in the database they have a field called 'copy' which is intended to store content of an article in. Ok fine, not the best of names, but it's there. So here's the issue. Since the hack, there are some 52k rows that have the word "viagra" in them. So when I look closer at the copy field and the code in a view source, this is what I find:
for the little kids in the neighborhood.<div style="display: none;">
Basically the opened and closed div tags that have a style set as seen above. So it doesn't visually render on the page but when you view the source or... "search engine spiders" come by, they see it. I couldn't figure out for the life of me why the .php files that got uploaded into the article_image directory were being indexed in Webmaster Tools - til tonight. Now I know why.
So here's what I need. Because each row in the database (52k of them) have what's given as an example (the <div style...>) part, and they all appear after the content that was there originally, I need something that I can add to a loop that will clean the crap out of the copy field so it cleans up the mess. I could take the str_replace method way - but that's too long and no guarantee that i would get all the stuff.
So - any suggestions?

Try this: (assuming "content" is the name of the column with the article content)
UPDATE `copy` SET `content`=
SUBSTR(`content` FROM 1 FOR LOCATE('<div style="display: none;">',`content`))
WHERE `content` LIKE '%<div style="display: none;">%';
Since you have indicated that these injections are always the last thing in an article, this will wipe them out pretty well. I strongly suggest taking a backup copy first, though!

Related

Running preg_replace on html code taking too long

At the risk of getting redirected to this answer (yes, I read it and spent the last 5 minutes laughing out loud at it), allow me to explain this issue, which is just one in a list of many.
My employer asked me to review a site written in PHP, using Smarty for templates and MySQL as the DBMS. It's currently running very slowly, taking up to 2 minutes (with a entirely white screen through it all, no less) to load completely.
Profiling the code with xdebug, I found a single preg_replace call that takes around 30 seconds to complete, which currently goes through all the HTML code and replaces each URL found to its SEO-friendly version. The moment it completes, it outputs all of the code to the browser. (As I said before, that's not the only issue -the code is rather old, and it shows-, but I'll focus on it for this question.)
Digging further into the code, I found that it currently looks through 1702 patterns with each appropriate match (both matches and replacements in equally-sized arrays), which would certainly account for the time it takes.
Code goes like this:
//This is just a call to a MySQL query which gets the relevant SEO-friendly URLs:
$seourls_data = $oSeoShared->getSeourls();
$url_masks = array();
$seourls = array();
foreach ($seourls_data as $seourl_data)
{
if ($seourl_data["url"])
{
$url_masks[] = "/([\"'\>\s]{1})".$site.str_replace("/", "\/", $seourl_data["url"])."([\#|\"'\s]{1})/";
$seourls[] = "$1".MAINSITE_URL.$seourl_data["seourl"]."$2";
}
}
//After filling both $url_masks and $seourls arrays, then the HTML is parsed:
$html_seo = preg_replace($url_masks, $seourls, $html);
//After it completes, $html_seo is simply echo'ed to the browser.
Now, I know the obvious answer to the problem is: don't parse HTML with a regexp. But then, how to solve this particular issue? My first attempt would probably be:
Load the (hopefully, well-formed) HTML into a DOMDocument, and then get each href attribute in each a tag, like so.
Go through each node, replacing the URL found for its appropriate match (which would probably mean using the previous regexps anyway, but on a much-reduced-size string)
???
Profit?
but I think it's most likely not the right way to solve the issue.
Any ideas or suggestions?
Thanks.

As your goal is to be SEO-friendly, using canonical tag in the target pages would tell the search engines to use your SEO-friendly urls, so you don't need to replace them in your code...

Oops ,That's really tough, bad strategy from the beginning , any way that's not your fault,
i have 2 suggestion:-
1-create a caching technique by smarty so , first HTML still generated in 2 min >
second HTMl just get from a static resource .
2- Don't Do what have to be done earlier later , so fix the system ,create a database migration that store the SEO url in a good format or generate it using titles or what ever, on my system i generate SEO links in this format ..
www.whatever.com/jobs/722/drupal-php-developer
where i use 722 as Id by parsing the url to get the right page content and (drupal-php-developer) is the title of the post or what ever
3 - ( which is not a suggestion) tell your client that project is not well engineered (if you truly believe so ) and need a re structure to boost performance .
run

Dynamic hierarchical pages and SEO

Cheers everyone!
Please bear with me, I really did do some research on this, but I couldn't come to a final solution, hence I'm here to hear your opinions.
What I want to build is a small i18n-CMS with dynamic hierarchical pages such as:
domain.tld/en/I/am/a/path
I want to find the least performance intense way that allows me to have beautiful, SEO and human-friendly URLs.
I use a Closure-Table, so two tables in the database, one for the pagenodes and one for the pathtree plus another table for the localised page, that references a certain pagenode (three in total).
My different solutions so far:
Sure I could make an algorithm, that goes through all the different request segments and checks if there is an English "path" under an "a" under an "am" under an "I", but this seems very unwise considering a multitude of page-hits.
Or is it?
Positive: I wouldn't need to save the path anywhere, because it would be calculated. So moving pages around wouldn't need to recalculate the path and save it again.
I could simply save the whole path to the database, as VARCHAR(2000) or something and then just check if there is a page with path "I/am/a/path" in English language and get that one.
This seems to be rather messy.
As I do it now. Currently I add an "ID" at the end of my path. Such as:
domain.tld/en/I/am/a/path.1
So if you enter "domain.tld/en.1" you get forwarded to the one with the right slug. But here again I need to save the slug to the database, for each single page.
Also I would love to get rid of the id (could I do this with mod-rewrite and .htaccess?)
Any more insights on this one? As I'm not a webdeveloper, so I'm not really sure regarding performance.
Kindest regards,
Meren

It seems to me that page request will happen a million times more often than an editor changing a page address. So I would definitely go with the save-to-db option. What you can do is create an extra field in which you save the 'slug' for that page, in combination with .htaccess you can redirect pages from the 'slug' addresses. For example in http://www.fuuu.com/futest-fu , 'futest-fu' is a slug which could be rewritten to an ID number (or anything you would want it to be). Amongst others, Wordpress works this way. Check out this discussion for some insights: http://wordpress.org/support/topic/where-are-the-permalinks-slug-stored-in-the-database

Using PHP to replace a line in a flat-file database

There are quite a few different threads about this similar topic, yet I have not been able to fully comprehend a solution to my problem.
What I'd like to do is quite simple, I have a flat-file db, with data stored like this -
$username:$worldLocation:$resources
The issue is I would like to have a submit data html page that would update this line based upon a search of the term using php
search db for - $worldLocation
if $worldLocation found
replace entire line with $username:$worldLocation:$updatedResources
I know there should be a fairly easy way to get this done but I am unable to figure it out at the moment, I will keep trying as this post is up but if you know a way that I could use I would greatly appreciate the help.
Thank you

I always loved c, and functions that came into php from c.
Check out fscanf and fprintf.
These will make your life easier while reading writing in a format. Like say:
$filehandle = fopen("file.txt", "c");
while($values = fscanf($filehandle, "%s\t%s\t%s\n")){
list($a, $b, $c) = $values;
// do something with a,b,c
}
Also, there is no performance workaround for avoiding reading the entire file into memory -> changing one line -> writing the entire file. You have to do it.
This is as efficient as you can get. Because you most probably using native c code since I read some where that php just wraps c's functions in these cases.

You like the hard way so be it....
Make each line the same length. Add space, tab, capital X etc to fill in the blanks
When you want to replace the line, find it and as each line is of a fixed length you can replace it.
For speed and less hassle use a database (even SQLLite)

If you're committed to the flat file, the simplest thing is iterating through each line, writing a new file & changing the one that matches.
Yeah, it sucks.
I'd strongly recommend switching over to a 'proper' database. If you're concerned about resources or the complexity of running a server, you can look into SQLite or Berkeley DB. Both of these use a database that is 'just a file', removing the issue of installing and maintaining a DB server, but still you the ability to quickly & easily search, replace and delete individual records. If you still need the flat file for some other reason, you can easily write some import/export routines.
Another interesting possibility, if you want to be creative, would be to look at your filesystem as a database. Give each user a directory. In each directory, have a file for locations. In each file, update the resources. This means that, to insert a row, you just write to a new file. To update a file, you just rewrite a single file. Deleting a user is just nuking a directory. Sure, there's a bit more overhead in slurping the whole thing into memory.
Other ways of solving the problem might be to make your flat-file write-only, since appending to the end of a file is a trivial operation. You then create a second file that lists "dead" line numbers that should be ignored when reading the flat file. Similarly, you could easily "X" out the existing lines (which, again, is far easier than trying to update lines in a file that might not be the same length) and append your new data to the end.
Those second two ideas aren't really meant to be practical solutions as much as they are to show you that there's always more than one way to solve a problem.

ok.... after a few hours work..this example woorked fine for me...
I intended to code an editing tool...and use it for password update..and it did the
trick!
Not only does this page send and email to user (sorry...address harcoded to avoid
posting aditional code) with new password...but it also edits entry for thew user
and re-writes all file info in new file...
when done, it obviously swaps filenames, storing old file as usuarios_old.txt.
grab the code here (sorry stackoverflow got VERY picky about code posting)
https://www.iot-argentina.xyz/edit_flat_databse.txt

Is that what you are location for :
update `field` from `table` set `field to replace` = '$username:$worldlocation:$updatesResources' where `field` = '$worldLocation';

Store data using a txt file

This question might seem strange but I have been searching for an answer for a long time and I couldn't find any.
Let's suppose you have a blog and this blog has many post entries just like any other blog. Now each post can have simple user comments. No like buttons or any other resource that would require data management. Now the query is: Can I store user comments on a single text file? Each post will be associated to a text file that holds the comments. So, if I have n posts I'll have n text files.
I know I can perfectly do this, but I have never seen it anywhere else and no one is talking about it. For me this seems better than storing all coments from all posts in a single mysql table but I don't know what makes it so bad that no one has implemented it yet.

Storing comments in text files associated with corresponding post? Lest see if it's good idea.
Okay adding new comments easy - write new text to the file. But what about format of your data? CSV? Ok then you would have to parse it before rendering.
Paging. If you have a lot of comments you may consider creating paging navigation for it. It can be done easily, sure. But you would need to open the file and read all the records to extract say 20.
Approve your comments. Someone posted new comment. You place it with pending status. So.. In admin panel you need to find those marked comments and process then accordingly - save or remove. Do you think it's convinient with text files? The same if use decided to remove its comment himself.
Reading files if you have many comments and many posts will be slower the it would be in case of database.
Scalability. One day you deside to extend you comments functionality to let one comment to respond to another. How would you do it with text files? Or example from comments by nico: "In 6 months time, when you will want to add a rating field to the comments... you'll have a big headache. Or, just run a simple ALTER query".
This is just for beggining. Someone may add something.

Well, there are good reasons why this isn't done. I can't possibly name them all, but the first things that come to mind:
Efficiency
Flexibility
Databases are much more efficient and flexible than plain text files. You can index, search and assign keys to individual comments and edit and delete any comments based on their key.
Furthermore, you'd get a huge pile of text files if the blog is quite big. While in itself that's not a problem, if you all save them in one directory, it can grow out of proportion and really increase the access time needed to find and open a specific text file.

Multiple column articles in Joomla

I've got a client that requires that an article be displayed in two, sometimes three, columns in Joomla. I am fairly sure they won't be happy with having to edit 3 articles for 3 columns so the splitting would have to be done automatically.
I've done something similar before where it'll split a chunk of HTML into n columns, but have no real idea how to accomplish this within Joomla itself.
Any ideas gratefully recieved!

An alternative approach:
Use Javascript to split up the Article in several column in the browser. Here I could imagine a full-automated approach could work.
Advantages (over the first approach):
As Javascript can know, which height the paragraphs actually have in the browser (unlinke PHP), you could find the optimum split more accurately.
This can be implemented in the template php-File: you tell the template to include the js-File. So it could be made context-dependent,E.g.: If the left column is collapsed (because there are no modules in it), tell the JavaScript-File to initialize to 3 columns, else 2 columns.
However, have in mind that it should rest usable for those who have Javascript disabled.

This doesn't seem to be easy.
At first thought this should be an CSS attribute, but if it exists, it is part of CSS 3.0 and as such only understood by modern browsers (if at all). But I didn't find any way to do this in CSS.
So you actually have to modify your HTML code. I would propose the following:
A Button (editor-xtd plugin) that splits the article into several parts, each one for one column, showing a dotted line in the editor box (similar to the "read-more"-Button).
E.g. it inserts in the article: (you will have to define hr.column in /templates/system/css/editor.css).
A (content) plugin that creates the multiple colum-style,
E.g. replacing the hr-Tag with table or floating divs.
This way, it is half-automized, without mangeling in the Joomla! files but only adding to extensions to it.

the CSS 3 rules for multi-columns are:
-column-width
-column-gap
-column-rule
-column-count
with the vendor label (-moz, -webkit) before.
More info at http://www.css3.info/preview/multi-column-layout/
I would use css and tell the people with Explorer to change browser! (i'm jocking of course)
Otherwise javascript is the way like said before. This script should do (not tested) http://13thparallel.com/archive/column-script/

This should be done through the template, some PHP coding is involved.
One of our clients asked us to do the exact same thing before, and we have done it through template. Note that for very small articles we increased the font in order to split the article into 3 columns.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.