How to store mostly-empty data? - php

I want to set up a system that allows for, say, 200 different translations per post. However most translations wouldn't exist, so there'd be a lot of empty datasets. How much of a performance and storage hit is it if I save every language (including empty ones) as a specific column? I.E.
English | Arabic | Mandarin | Russian | French | German
Potato | | | | Pomme de Terre |
Orange | | | | Orange |
Peach | | | | |
I wouldn't cycle through the whole list very often, I'd use a session variable or usersetting and then load directly from that column if it exists, with a fallback to a default language, and perhaps after that a full search.
if (exists(french))
{echo french}
else {if(exists(english))
{echo english}}
else {echo links to non-null language}
}
I'd assume that, if I tell the server which column to go to, the overhead in terms of processing would be negligible? I also assume that an empty cell would be negligible in terms of storage? However I don't know for sure, and it could potentially be a huge mistake.
The reason I'd want to work like this is so I could assign language codes, instead of every installed instance having a different order (e.g. english|french|german|mandarin versus english|mandarin|german|french).
To prevent XY-problems, here's a more global formulation:
I want to set up a system that allows for many languages, but I expect that in most cases only 1 or two are used. What would be an efficient way to store?

Keyword: Relational database.
You will want to use multiple tables.
Let's say that the default langauge is english, then your "words" table will implicitly contain the english words.
Words:
Id | Word
1 | Potato
2 | Orange
Languages:
Id | Name
1 | Norwegian
2 | Danish
Translations:
Word | Language | Translated
1 | 1 | Potet
2 | 1 | Oransje
1 | 2 | Kartoffel
2 | 2 | Appelsin
Then you can do (pseudo sql, you can look up the language and word ids first, or use a more advanced query):
SELECT Translated FROM Translations WHERE Word = (the word id) and Language = (the language id)
This comes with the benefit that it's very simple to list all the languages you support, all the Words you support, and also all translated words for a specific language (or, find all NON translated words for a language).
A specific query for translating "Potato" into "Danish" would look like:
SELECT Translated FROM Translations
JOIN Words ON Words.Id = Translations.Word
JOIN Languages ON Languages.Id = Translations.Language
WHERE
Languages.Name = "Danish" and Words.Word = "Potato"

Related

Grouping items in PHP database query

I think this is a fairly simple problem, though it's a little difficult to explain.
Let's start with a list of state symbols - state flags, flowers, birds, etc. The rows in my MySQL database match each symbol with the URL at which its page appears. So if field Designation = 'flower', then the URL will be arizona/flower, new-york/flower, or something like that.
However, some states have multiple symbols in various categories; e.g. two state birds, three state flowers, etc. Making it still more confusing, I haven't yet figured out if I should describe all of a state's flower symbols on one page or make a unique page for each flower symbol.
For the time being, I'm playing it by ear, as indicated by the following database data. In this example, Arizona has two flower symbols, each discussed on the same page (URL = arizona/flower).
+----------+-----------------+-------------+
| State-ID | URL | Designation |
+----------+-----------------+-------------+
| us-az | arizona/flower | flower |
| us-az | arizona/flower | flower |
| us-az | arizona/tree | tree |
| us-fl | florida/mammal | mammal |
| us-fl | florida/mammal2 | mammal |
| us-fl | florida/mammal2 | mammal |
| us-fl | florida/mammal3 | mammal |
| us-fl | florida/bird | bird |
| us-fl | florida/bird2 | bird |
+----------+-----------------+-------------+
However, Florida has FOUR official mammals. The first one (the Florida panther) is discussed # mysite/florida/mammal. The two marine mammals are discussed at mysite/florida/mammal2, and the state horse is featured # mysite/florida/mammal3.
So here's my question: How can I write a query that 1) distinguishes between single designations (e.g. Arizona's state tree) and multiple designations (e.g. Arizona's state flowers) and 2) also tells me if the multiple designations are linked to a single URL or multiple URL's?
It will take me a while to iron out all the kinks, but, for now, it would be very helpful if I had a query that listed ONLY multiple URL's. For example, it wouldn't even display Arizona's state tree. The query could serve as sort of a snapshot of my list of symbols, helping me identify all the multiple designations and which of them are linked to single URL's vs multiple URL's.
I'm working with PHP and MySQL on a Mac.
P.S. I should point out that there are additional fields, including one that gives symbols a specific value (e.g. 'Florida panther', 'manatee', 'dolphin', 'Cracker Horse').
SELECT `State-ID`, Designation, COUNT(*) all_count, COUNT(DISTINCT URL) url_count
FROM YourTable
GROUP BY `State-ID`, Designation
url_count will tell you if they're linked to the same URL or different ones. You can add HAVING url_count > 1 to restrict the results only to items that link to multiple URLs.
You can use the HAVING clause to filter on aggregate functions
SELECT State-ID, Designation, COUNT(*) AS cnt
FROM stateTable
GROUP BY State-ID,Designation
HAVING COUNT(*)>1
That will give you all State-ID,Designation combinations that have more than one row in the table. If you also group by URL you can determine if they share a URL.
Basically you should read up on grouping and aggregation in MySQL: 12.15.1 GROUP BY (Aggregate) Functions and MySQL SELECT Syntax

Multiple languages on website from mysql

Is it bad idea to hold translation in database and display words translation based on user settings. I imagined something like this:
mysql
lang | greeting
spanish | ola
english | hi
for example
$sql=mysql_query("SELECT * FROM lang where lang='spanish'");
and then just echo that value in proper div
You can have 1 table for your languages
id_lang | label
1 | English
2 | Spanish
3 | French
And then each your other tables can have a second key id like this for exemple :
traduction_table
id | id_lang | value
1 | 1 | Hi
2 | 2 | Ola
3 | 3 | Salut
You need tu have primary keys on id and id_lang.
Then you just have to make your SQL requests correctly, like :
select * from traduction_table where ID = 1 and id_lang = ??
Where ?? is replaced by the user language ofc.
From my experience, best is maintaining what I call "interface" texts (English "cancel",Spanish "cancelar",...) with .po files (gettext standard)
On the other hand, all texts likely to be editable by the user, they should be stored in database. Per example, if it is a site which needs to publish a description in different languages, and this description needs to be edited/maintained by the user, and not by the programmer, then you should store the translations in database. And yes,somehing like that table you said is a good idea.
This way you don't need to add columns for extra languages.
id | text | lang
------------------------
1 | ola |spanish
2 | hi |english

php & mysql: use a table for a filter list for another table

I have two mysql tables. One is a bad words list, the other is the table to compare against the bad words list. Essentially I want to filter out and return a list of rows with domains that do not have ANY occurrence of a word in the bad words table. A few sample tables:
bad words list
+----------+------------------+
| id | words |
+----------+------------------+
| 1 | porn |
| 2 | sex |
+----------+------------------+
table of domains to compare
+----------+------------------+
| id | domain |
+----------+------------------+
| 56 | google.com |
| 57 | sex.com |
+----------+------------------+
I want to return results such as
+----------+------------------+
| id | domain |
+----------+------------------+
| 56 | google.com |
+----------+------------------+
A thing to note is that these tables have nothing in common, so I'm not even sure this is the best method. I was using a comparison function in PHP but that seemed to be way too slow over hundreds of thousands of rows to search.
It is possible to get from mysql. like this:
SELECT
d.*
FROM
domains d
LEFT JOIN
words w ON(d.domain LIKE CONCAT('%',w.word,'%') )
GROUP BY
d.domain
HAVING
COUNT(w.id) < 1
but it is not optimal and will get slower and slower with more records in both tables.
Data like this typically needs to be pre-calculated at insertion time rather than at fetch time. You should add a column to Domains something like "bad_words boolean default null".
null would mean "don't know" which in some context could be interpretted as "unsafe to show".
false means "no bad words" and true means "contains bad words".
Everytime the list of bad words is updated all columns are reset to null and some background work will start to process them again. Probably in another language than sql.

Searching a database with forgiveness.

I have a database of 30k elements, each game names.
I'm currently using:
SELECT * from gamelist WHERE name LIKE '%%%$search%%%' OR aliases LIKE '%$search%' LIMIT 10"
for the searchbar.
However, it can get really picky about things like:
'Animal Crossing: Wild World'
See, many users won't know that ':' is in that phrase, so when they start searching:
Animal Crossing Wild World
It won't show up.
How can I have the sql be more forgiving?
Replace the non alphanumeric characters in the search parameter with wildcards so Animal Crossing Wild World becomes %Animal%Crossing%Wild%World% and filter on that.
I would suggest you make another table witch contains keyworks like
+---------+------------+
| game_id | keyword_id |
+---------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
+---------+------------+
 
+------------+--------------+
| keyword_id | keyword_name |
+------------+--------------+
| 1 | animal |
| 2 | crossing |
| 3 | wild |
| 4 | world |
+------------+--------------+
After that you can easily explode the user given text into keywords and search for them in the database, witch will give you the id's of the possible games he/she was looking for.
Oh, and remove special symbols, like ":" or "-", so you don't need multiple keywords for the same phrase.
The following is from MySQL LIKE %string% not quite forgiving enough. Anything else I can use? by the user M_M:
If you're using MyISAM, you can use full text indexing. See this tutorial
If you're using a different storage engine, you could use a third party full text engine like sphinx, which can act as a storage engine for mysql or a separate server that can be queried.
With MySQL full text indexing a search on A J Kelly would match AJ Kelly (no to confuse matters but A, J and AJ would be ignored as they are too short by default and it would match on Kelly.) Generally Fulltext is much more forgiving (and usually faster than LIKE '%string%') because allows partial matches which can then be ranked on relevance.
You can also use SOUNDEX to make searches more forgiving by indexing the phonetic equivalents of words and search them by applying SOUNDEX on your search terms and then using those to search the index. With soundex mary, marie, and marry will all match, for example.
You can try Match () AGAINST () if your MySQL engine is MyISAM or InnoDB:
http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html
http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
Your resulting SQL will be like this:
SELECT * from gamelist WHERE MATCH (name, aliases) AGAINST ('$search' IN BOOLEAN MODE) LIMIT 10
The behavior of the search is more like the boolean search used in search engines.

Find Similar Descriptions in Database PHP/MySQL

We are building a help desk application for running our service company, and I am trying to figure out to assist the call center people in assigning a category based the problem description from the customer.
My primary idea, is to compare the description the customer gave, to prior descriptions, and use the category that was used in the prior service calls based on the most common category assigned.
Any ideas how to do it?
My description field is a blob field as some descriptions are quite long. I would prefer to find a way to do this that requires the least system resources.
Thanks for any input :)
Mike
I'm a person of custom code; I don't feel the job is done right if you use big, bloated systems, so take this with a grain of salt if you are not wanting to code this yourself. However, this might not be as hard as you're making it; yes, I would definitely go with a tagging system. However, it doesn't have to be so complicated.
Here's how I would handle it:
First, make a database with 3 tables; one for categories, tags, and 'links' (links between categories and tags).
Then, create a PHP function that initializes an array (empty works just fine) and pushes new (lowercased) words if they don't exist. An example of this might be:
<?php
// Pass the new description to this
// function.
function getCategory($description)
{
// Lowercase it all
$description = strtolower($description);
// Kill extra whitespace
$description = trim($description);
$description = preg_replace('~\s\s+~', ' ', $description);
// Kill anything that isn't a number or a letter
// NOTE: This is untested, so just edit this however you'd like to make it work. The
// idea is to just eliminate everything that isn't a letter or number. Just don't take out
// spaces; we need them!
$descripton = trim($description, "!##$%^&*()_+-=[]{};:'\"\\\n\r|<>?,./");
// Now the description should just contain words with a single space in between them.
// Let's break them up.
$dict = explode(" ", $description);
// And find the unique ones...
$dict = array_unique($dict, SORT_STRING);
// If you wanted to, you could trim either common words you specify,
// or any words under, say, 4 characters. Up to you!
return $dict;
}
?>
Next, populate your database how you want; make a few categories and some tags, and then link them together (if you want to get fancy, switch the MySQL engine to InnoDB and make relationships. Makes things a bit quicker!)
Table `Categories`
|-------------------------|
| Column: Category |
| Rows: |
| Food |
| Animals |
| Plants |
| |
|-------------------------|
Table `Tags`
|-------------------------|
| Column: Tag |
| Rows: |
| eat |
| hamburger |
| meat |
| leaf |
| stem |
| seed |
| fur |
| hair |
| claws |
| |
|-------------------------|
Table `Links`
|-------------------------|
| Columns: tag, category |
| Rows: |
| eat, Food |
| hamburger, Food |
| meat, Food |
| leaf, Food |
| leaf, Plant |
| stem, Plant |
| fur, Animals |
| ... |
|-------------------------|
By using MySQL InnoDB relationships, the links table will not take up any more space by creating rows; this is because they are linked, in a way, and are all stored by reference. This will immensely cut down on database size.
Now, for the kicker, a clever mysql query to the database, which follows these steps:
For each category, sum up the tags belonging both to the category and the description dictionary (which we created in the earlier PHP function).
Sort them from greatest to least
Pull the top 1 or 3 or however many suggested categories you'd like!
This will get you a nice list of categories that have the highest matching count of tags. How you want to craft the MySQL query is up to you.
While this seems like a lot of setup, it really isn't. You have 3 tables at most, one or two PHP functions and a few MySQL queries. The database will only be as big as the categories, the tags and the references to both (in the links table; references don't take up much space!)
To update the database, simply put in tags that don't exist to the tags database and link them to the category you decided to assign to the description. This will broaden your database's range of tags and will, over time, get your database more tuned to your descriptions (i.e. more accurate).
If you wanted to get really detailed, you'd insert duplicate links between categories and tags to create a sort of weighted tag system, which would make your system even more accurate.

Categories