Search database for offensive words [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am looking on running a search on my database at set intervals for a list of words I consider offensive (because I am an authoritarian dictator and I hate free speech - I rule with an Iron fist).
How would I most efficiently search my database for a list of keywords? The two columns I intend to search are indexed as Fulltext.
If anyone knows of a list of offensive words that would be useful too.
A note to those who ridicule my attempts at censorship
I have will have two systems in place. The first is a report function which is checked daily by admins. The second tool to combat the dissenters is this one. All it needs to be is a word search so that the admin may check through and descide if the content is offensive or not.

Mysql won't give you the tools for an acurate search, take this sample, if you have among your words:
freedom
Since you are a dictator you don't want it, it should appear, but clever users will put fr33dom, which is the same, now you have 3 ways to dot this:
You place in your list one word and
most derivations you can imagine
You make a search with a LIKE in your MySql query, but it should be sloow when you hit the thousands, even with fulltext indexes
You Index your content using Lucene
I would go for the third, since Lucene is the best choice for performing searches, and since you are looking for words I can imagine that you are dealing with text, so this might help more than you think. Lucene can help you searching words similar to freedom, but not it, there you shouldn't miss much!! And your rule is guarrantied!
There are extensions for Lucene using Zend Framework, you can find them easily in Google.
Best of luck in your dictatorial efforst!

here's your staring list!
http://onlineslangdictionary.com/lists/most-vulgar-words/
Check site for more
idea: DB their list, then screen against your DB.
Or, DB their list, create all as key words, blocking entry.
Then, use SQL wild card within words to check for: freedom or Fr**dom,
But problems tech1 derivations are infinite.

The link below leads to the list of 2200 bad words in 12 languages. MySQL dump, JSON, XML or CSV options are available.
https://github.com/turalus/openDB
Execute this dump into your own database and then query for any occurrence.

Related

Automated Script to Visit URLs [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am creating a simple web application using PHP, Codeigniter and Google Books (on an WIndows 7 XAMPP localhost environment).
I have a MySQL list of books (a few hundred) and corresponding ISBN numbers. When a user views a book / visits a URL for the first time, and API call is made to Google Books and the title, author and description of the book is saved to my database.
Ideally i'd like to populate the database myself and not rely on the user. So, I was thinking of visiting each URL manually. However there a lot of items!
Is there a script I can use for such a task? I was hoping to run the script once every 5 minutes over a 24 hour period.
My URL is in the following format:
/items/itemView/1 // <-- book 1
/items/itemView/2 // <-- book 2
/items/itemView/3 // <-- book 3
// etc
// etc
Thanks
Short Answer:
A storage API exists so you don't have to catalogue everything.
Long Answer:
It sounds like what you are trying to do is take the API and scour through every single entry and record them for your own purposes.
While this can usually done fairly simply, instead of telling you how to do this, I'm going to tell you why you shouldn't.
An API to a huge database exists so that you don't have to store it all, as the resources required can be absolutely huge, usually more than most enthusiasts would even have.
It's better to have it as you do now, cache what is visited on the chance it is visited again and make sure periodically that any records you DO keep, you compare to it's source so that you don't have an out-of-date record (another pitfall of local caching).
I hope this helps at least show you why people tend not to duplicate large data sources.

how to create a better search algorithm than just simple match and search [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
what algorithm will be best suited for the following situation:
Suppose the user enters in search box :- Dell Computers
But in the database this term doesn't exist but what exist is :- Dell
or just :-Computers
so how/what alogrithm can work for the above scenario.
Steps required:
1) Find to see if an exact match exists for "Dell Computers"
2) If not, then check for each word like "Dell" and "Computers"
Moreover i want to implement this in PHP. Any ideas how to do it?
This has been done extensively in the area of Full text searching. Look at Lucene, ElasticSearch, MySQL Full-Text Search, or PostgreSQL Full Text Search.
The basic idea is to create a trie of single keywords pointing to the resulting set of articles/documents, then look up each word separately and do a set intersection of the results to find articles matching both - and fall back on the individual result sets if there are no good intersections.
Add to that stemming of the lookup words, and you're on your way to reimplementing Lucene and friends.

How do Lucene/Sphinx/Solr work? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a website in Phalcon and I'm trying to add a search engine to it. The content, however, is not in a DB and is in flat files.. located in app/views/.
I've never implemented a search engine, but from what I gather it seems like Lucene or Solr/Sphinx is what I need.
Do these tools offer the option to parse my website ala HTTrack, thus creating the index and necessary absolute URI hyperlinks?
How do I go about specifying what portion of the HTML files I want to be parsed? How do they interact with ignoring certain areas ( eg HTML, JS )?
Lucene is first and foremost an index. That's not even a database, it's just the index portion of the database if you will. It's highly configurable in what it indexes and how and what data should be retained in its original format and what can be discarded once it has been indexed. You create a schema first, just like you create a database schema. However, in the case of Lucene that schema defines what kind of tokenisers and filters to use to create the index for your fields. You then feed your documents into it to let it populate the index. That's up to you, there are several different APIs that let you feed data in. A "web crawler" is not one of them, it won't go out and find your data automatically. You can then query the index in various ways to retrieve documents you have fed in before. That's it in a nutshell.
Lucene is pretty much exclusively the index engine, which is about tokenising and transforming text and other data into an index that can be queried quickly. It's the part that let's you query for "manufacturer of widgets" and return a document with the text "widget manufacturers", if you have tweaked your indexing and querying accordingly. Solr is an appliance wrapped around Lucene that adds an HTTP based API and some other niceties. Both are still somewhat low-level tools you can use to build a search engine. It's not an out-of-the-box "search engine" like Google by any means.

State list of world with country-code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Where can I found the list of all the states of world with their country code (ISO2 or ISO3).
I have to insert all these states in my database. If the list is available in the .sql will be great.
I need state list of world with country-code not the country list.
Do you know where can I found?
Thanks in Advance.
http://www.timdavis.com.au/data/
There is a link to a excel spreadsheet of all the country and state information.
Wikipedia has the full list of both 2-letter and 3-letter country codes:
http://en.wikipedia.org/wiki/ISO_3166-1
With regards to having it in a local DB table, note that this list does occasionally change as nations are created, renamed or merged, so although it's not very frequent, you do need to keep it up-to-date, and also importantly, know what you're going to do with codes which become obsolete (ie if you've got cross-references to it from other tables, you can't just delete a record without making the cross-references invalid)
[EDIT]
You comment that you're looking for a state list.
This phrase "state list" is confusing. Are you using the word "state" as its used in the US. Other countries would refer to those as provinces, regions, counties, cantons, or a range of other terms.
More importantly, very few countries have codes for their individual regions.
For example, the UK is broken into counties such as Yorkshire, Hampshire and Surrey, but there aren't any codes that map to these names. There are short abbreviated versions of some of the names (ie 'Hants'=='Hampshire'), but they're colloquial abbreviations; certainly not official. There are also UK postcodes which do provide codes for areas, but these do not map to named counties. And other countries don't even have that.
This has annoyed me too. My issue is I can never find states to match countries. I created this for people to use. Its in YAML format, I generated python and javascript versions. I also generated an alternate array.
https://github.com/niall-oc/minimax
Search google for: php country abbreviations list
Here's the first link: http://27.org/isocountrylist/
http://www.timdavis.com.au/data/
Interesting data for Country & states.
Download excel from here correct the state names and import into your database and use.
You can get the SQL Scripts from this post:
SQL Scripts

Copyright content API [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am looking to see if there is an automatic "copyright content" API that we can use. I know atrributor have a paid service, but I'm wondering if their is something that effectively does a google search for a portion of the content, to check if whole sentences have been copied from elsewhere.
Basically We have several blogers that write for us and we want to check if any of the articles have been partially or completely copied from another source on the web. Manually I would select a few sample sentences and paste them into Google (using quotation marks) to see if I get any exact matches.
Is there a free API / service that you guys are aware of?
I was actually reading something about this a few days back and someone mentioned a service called Copyscape that has an API in its premium service - not free though
I am using Copyleaks API. Nice well built API that allows you to query URL or upload a file to check for plagiarism online.
Copyleaks homepage
Have a good day!
I heard about new service - PlagSpotter. You can read about their API here. However I think it's not free now.
One thing you can do is use a free search API (e.g.: Yahoo Boss).
The idea being is that a couple snippets of text are searched for and the results are evaluated either manually or automatically.
Honestly, the idea sounds great, but I do not think that there is anything like that out there and won't be in the next few years. The reason is probably howto define which sentences are copyright protected and which ones are not (every single line from a book or magazine could be copyright protected). Even if this decision could be properly made, it that would lead to a enormous database and request would take a long time.

Categories