Count line breaks in a field and order by

Count line breaks in a field and order by - php

I have a field in a table recipes that has been inserted using mysql_real_escape_string, I want to count the number of line breaks in that field and order the records using this number.
p.s. the field is called Ingredients.
Thanks everyone

This would do it:
SELECT *, LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', '')) as Count
FROM Recipes
ORDER BY Count DESC
The way I am getting the amount of linebreaks is a bit of a hack, however, and I don't think there's a better way. I would recommend keeping a column that has the amount of linebreaks if performance is a huge issue. For medium-sized data sets, though, I think the above should be fine.
If you wanted to have a cache column as described above, you would do:
UPDATE
Recipes
SET
IngredientAmount = LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', ''))
After that, whenever you are updating/inserting a new row, you could calculate the amounts (probably with PHP) and fill in this column before-hand. Or, if you're into that sort of thing, try out triggers.

I'm assuming a lot here, but from what I'm reading in your post, you could change your database structure a little bit, and both solve this problem and open your dataset up to more interesting uses.
If you separate ingredients into its own table, and use a linking table to index which ingredients occur in which recipes, it'll be much easier to be creative with data manipulation. It becomes easier to count ingredients per recipe, to find similarities in recipes, to search for recipes containing sets of ingredients, etc. also your data would be more normalized and smaller. (storing one global list of all ingredients vs. storing a set for each recipe)
If you're using a single text entry field to enter ingredients for a recipe now, you could do something like break up that input by lines and use each line as an ingredient when saving to the database. You can use something like PHP's built-in levenshtein() or similar_text() functions to deal with misspelled ingredient names and keep the data as normalized as possbile without having to hand-groom your [users'] data entry too much.
This is just a suggestion, take it as you like.

You're going a bit beyond the capabilities and intent of SQL here. You could write a stored procedure to scan the string and return the number and then use this in your query.
However, I think you should revisit the design of whatever is inserting the Ingredients so that you avoid searching strings in of every row whenever you do this query. Add a 'num_linebreaks' column, calculate the number of line breaks and set this column when you're adding the Indgredients.
If you've no control over the app that's doing the insertion, then you could use a stored procedure to update num_linebreaks based on a trigger.

Got it thanks, the php code looks like:
$check = explode("\r\n", $_POST['ingredients']);
$lines = count($check);
So how could I update all the information in the table so Ingred_count based on field Ingredients in one fellow swoop for previous records?

Related

Speed of SELECT Distinct vs array unique

I am using WordPress with some custom post types (just to give a description of my DB structure - its WP's).
Each post has custom meta, which is stored in a separate table (postmeta table). In my case, I am storing city and state.
I've added some actions to WP's save_post/trash_post hooks so that the city and state are also stored in a separate table (cities) like so:
ID postID city state
auto int varchar varchar
I did this because I assumed that this table would be faster than querying the rather large postmeta table for a list of available cities and states.
My logic also forced me to add/update cities and states for every post, even though this will cause duplicates (in the city/state fields). This must be so because I must keep track of which states/cities exist (actually have a post associated with them). When a post is added or deleted, it takes its record to or from the cities table with it.
This brings me to my question(s).
Does this logic make sense or do I suck at DB design?
If it does make sense, my real question is this: **would it be faster to use MySQL's "SELECT DISTINCT" or just "SELECT *" and then use PHP's array_unique on the results?**
Edits for comments/answers thus far:
The structure of the table is exactly how I typed it out above. There is an index on ID, but the point of this table isn't to retrieve an indexed list, but to retrieve ALL results (that are unique) for a list of ALL available city/state combos.
I think I may go with (I don't know why I didn't think of this before) just adding a serialized list of city/state combos in ONE record in the wp_options table. Then I can just get that record, and filter out the unique records I need.
Can I get some feedback on this? I would imagine that retrieving and filtering a serialized array would be faster than storing the data in a separate table for retrieval.

To answer your question about using SELECT distinct vs. array_unique, I would say that I would almost always prefer to limit the result set in the database assuming of course that you have an appropriate index on the field for which you are trying to get distinct values. This saves you time in transmitting extra data from DB to application and for the application reading that data into memory where you can work with it.
As far as your separate table design, it is hard to speculate whether this is a good approach or not, this would largely depend on how you are actually preforming your query (i.e. are you doing two separate queries - one for post info and one for city/state info or querying across a join?).
The is really only one definitive way to determine what is fastest approach. That is to test both ways in your environment.

1) Fully normalized table(when it have only integer values and other tables have only one int+varchar) have advantage when you not dooing full table joins often and dooing alot of search on normalized fields. As downside it require large join/sort buffers and result more complex queries=much less chance query will be auto-optimized by mysql. So you have optimize your queries yourself.
2)Select distinct will be faster in almost any cases. Only case when it will be slower - you have low size sort buffer in /etc/my.conf and much more size memory buffer for php.
Distinct select can use indexes, while your code can't.
Also sending large amount of data to your app require alot of mysql cpu time and real time.

How to store searchable arrays in MySQL

So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks

You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).

create separate columns or you can store them all in one column using bit mask

One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.

Counting items or incrementing a number?

From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?
Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.
Thank you.

To get the number of items (rows in a table), you'd use standard SQL and do it on demand
SELECT COUNT(*) FROM MyTable
Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.
This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.
Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse

I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.
I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.
Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.

I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.
Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items

In my forum I count the sub-threads in a forum like this:
SELECT COUNT(forumid) AS count FROM forumtable
As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.

ORDER BY price with $ sign and commas

I have a field in my database named "price" and it setup as varchar. It contains dollar sign as well as commas.
The values in my database are like this:
$100,000
$625,005
$115,990
$2,450,000
$137,005
and I would like it to order it like this:
$100,000
$115,990
$137,005
$625,005
$2,450,000
I tried ORDER BY 0+price and ORDER BY ABS(price) but they just outputted in the order it was in the database. Is there anyway to order this while keeping the field varchar

If at all possible, change your database to hold those values in a int, float or decimal field, depending on how much precision you need. Add the $ and all other formatting when outputting the values.
Everything else is just duct-taping around a bad database structure. It's not impossible, but it should be the very last resort when there is absolutely no way to change the database.

Fully agreeing with the above posts regarding the database design, there is a bad way anyways:
SELECT REPLACE(REPLACE(price,',',''),'$','') as cleanPrice FROM Table ORDER BY cleanPrice
The query has to do the replacements on every single row and therefore might become very slow..

Honestly you should really store these as int's and format them in PHP when you bring them out of the database, that way it's easier to work with the data and you can manipulate them as INT's.
When you pull them out you can use numeric_format to auto-add the comma's and then just add a $ infront of the price.
If you are storing multiple types of price's you can store a Currency Type in the DB too; which in this case would be USD.

I did this in Oracle quickly, so it might not work in MySQL...if not, I'm sorry. I'm only selecting the prices
select price
from table
order by (cast(substr(replace( replace('price', ',', ''), '"', '' ), 2, length(replace( replace('price', ',', ''), '"', '' ))-1) as int)

If you don't have any way of changing the DB structure you can add a computed column to the table for sorting purposes.
Unfortunately, mysql does not support computed columns, so after you create the new column you would need to add triggers (on UPDATEs and INSERTs) to compute the value for every inserted/changed row on the table.
In your trigger you would use REPLACE(REPLACE(price,'$',''),',','') for the value of the new column.
You could also create a view that has this logic in it and select from that.
The mysql documenation for creating triggers is located here
The mysql documenation for using views is located here

mysql insert multiple data into a single column or multiple row

just want to ask for an opinion regarding mysql.
which one is the better solution?
case1:
store in 1 row:-
product_id:1
attribute_id:1,2,3
when I retreive out the data, I split the string by ','
I saw some database, the store the data in this way, the record is a product, the column is stored product attribute:
a:3:{s:4:"spec";a:2:{i:1;s:6:"black";i:3;s:2:"37";}s:21:"spec_private_value_id";a:2:{i:1;s:11:"12367591683";i:3;s:11:"12367591764";}s:13:"spec_value_id";a:2:{i:1;s:1:"5";i:3;s:2:"29";}}
or
case2:
store in 3 row:-
product_id:1
attribute_id:1
product_id:1
attribute_id:2
product_id:1
attribute_id:3
this is the normal I do, to store 3 rows for the attribute for a record.
In term of performance and space, anyone can tell me which one is better. From what I see is case1 save space, but need to process the data in PHP (or other server side scripting).
case2 is more straight forward, but use spaces.

Save space? Seriously? You're talking about saving bytes when a one terabyte disk goes for 70 dollars?
And maybe you're not even saving bytes. If you store attributes as "12234,23342,243234", that's like 30 bytes for 3 attributes. If you'd store them as smallint, they'd take up 6 bytes.

Depends on whether the attributes are important for searching later, for example.
It may be good if you keep attributes as serialized array in just one field in case you actually don't care about them and in case that you, for example, won't need to run a query to show all products that have one attribute.
However, finding all products that have one attribute would be at least "lousy" in case you have attributes as comma-separated (you need to use LIKE), and in case you store attributes as serialized arrays they are completely unusable for any kind of sorting or grouping using sql queries.
Using separate table for multiple relations between products and attributes is far better if they are of any importance for selecting/grouping/sorting other data.

In case 1, although you save space, there's time spent on splitting the string.
You also must take care of the size of your field: If you have 50 products with 2 attributes and one with 100 attributes, you must make the field ~ varchar(200)... You will not save space at all.
I think case 2 is the best and recommended solution.

You need to consider the SELECT statements that would be using these values. If you wish to search for records that have certain attributes, it is much more efficient to store them in separate columns and index them. Otherwise, you are doing "LIKE" statements which take much longer to process.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.