I am using SUBSTRING function to retreive an "excerpt" of a message body:
SELECT m.id, m.thread_id, m.user_id, SUBSTRING(m.body, 1, 100) AS body, m.sent_at
FROM message m;
What I would like to do is add 3 dots to the end of the substring, but only if the source string was more than my upper limit (100 characters), i.e. if substring had to cut off the string. If the source string is less than 100 characters then no need to add any dots to the end.
I am using PHP as my scripting language.
That can be done in the query, rather than PHP, using:
SELECT m.id, m.thread_id, m.user_id,
CASE
WHEN CHAR_LENGTH(m.body) > 100 THEN CONCAT(SUBSTRING(m.body, 1, 100), '...')
ELSE m.body
END AS body,
m.sent_at
FROM MESSAGE m
The term for the three trailing dots is "ellipsis".
Ask for 101 characters. If you receive 101 characters your resource string is definitely more than 100 characters. In that case, remove the last character in your scripting language of choice and add "...". This will relieve your DB somewhat.
Personally I would advise you to create a bit of a difference though. E.g. cut off at 90 characters if and only if you exceed 110 characters (by requesting 110 + 1 characters of course). Otherwise you will get the effect I notice with Slashdot sometimes: you have a Read the rest of this comment link, only to receive the final word of the story.
More or less, the user will be annoyed if the method of retrieving the rest of the story takes more space than the story itself.
Related
I am new to regex and I know the basics of how to pull out one sub string from a given string but I am struggling to get out multiple parts that I need. I am wondering if someone could help me with this simple example and then I work my way from there. Take this string:
LMJ won Neu. Zone - KEN #55 LEIGH vs LMJ #63 ONEIL
The parts in italics are the parts of the string that will change and bold will stay the same in every string. The parts I need out are:
First team id which in this case is LMJ, this will always start the string and be 3 uppercase letters, ^[A-Z]{3}?
The Neu part which could be one of 3 strings, Neu, Off, Def, [Neu|Off|Def]?
The second team part which will come always after the word Zone -, [A-Z]{3}?
Need the numeric part of the string after the first #. This could be 1 or 2 digits [0-9]{1,2}?
5.Third team part same as 3 except will appear after vs, [A-Z]{3}?
Same as 4 need numeric part after 2nd #, [0-9]{1,2}?
I would like to put that all together into one regex is that possible?
Everything inside square brackets is a so-called character class: it matches only a single character. so, [Neu|Off|Def] means: exactly one of the characters N, e, u, |, O, f or D (repetitions are ignored)
What you want is a capture group: (Neu|Off|Def)
Putting it together:
^([A-Z]{3}) won (Neu|Off|Def)\. Zone - ([A-Z]{3}) #([0-9]{1,2}) [A-Z]+ vs ([A-Z]{3}) #([0-9]{1,2}) [A-Z]+$
(This assumes you're not interested in the "LEIGH" and "ONEIL" parts, and these are always in upper case letters)
The regex should be something like;
'/([A-Z]{3})\ won\ (Neu|Off|Def)\.\ Zone\ -\ ([A-Z]{3})\ (\#[0-9]{1,2}\ \w+)\ vs\ ([A-Z]{3})\ (\#[0-9]{1,2}\ \w+)/'
() are used for capturing the different parts.
This is not tested properly.
Lets say I need to get a string from MySQL database smaller than 150 characters BUT I do not want to cut the last word, instead I need it until the last space and less than 150 characters.
For example:
I want:
Derrick Rose and the Chicago Bulls.
I don't want:
Derrick Rose and the Chica.
Is there a way to do this in MySQL, PHP or a combination of both?
You can do this with the built-in string functions:
reverse the first 150 characters of the string
find the first space in the reversed string
use this information to get the right string
The SQL looks something like this:
select left(left(str, 150), 150 - locate(' ', reverse(left(str, 150))))
Write a loop in PHP that starts at position 150, and works back until it encounters either a space character, or the start of the string.
If it encounters a space character, take all characters from the start of the string to the position you just found. Otherwise, use the first 150 characters (edge case that there are no space characters in the first 150).
I am trying to remove any 'groups of characters' with less than 3 characters.
This is the source:
1.29 Cancels part plan C/5879 2030. in i i.r e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands n f 53dv 3 N014 3.5.98. PLAN or any from 01 53 under M R.5I B.L.1laY98 E35. P0 RT I 0 N S At Maroubrajuncti p /I .z. .0 / .L .I. .I
Settings bounds for word characters with repetition between 1 and 3 e.g. /b\w{1,3}\b/ does not work as "C/5879" would become "5879".
The desired output would be as follows:
1.29 Cancels part plan C/5879 2030. e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands 53dv N014 3.5.98. PLAN from under R.5I B.L.1laY98 E35. Maroubrajuncti
An alternative which could also work would be to create larger 'groups of characters' by joining 'groups of characters' with 2 or less characters delimited by a whitespace.
For example:
1.29 Cancels part plan C/5879 2030. inii.r e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands nf 53dv 3N014 3.5.98. PLAN orany from 0153 under MR.5I B.L.1laY98 E35. P0RTI0NS AtMaroubrajuncti p/I.z. .0/.L.I..I
I would be open to either solution to rescue me from Regex Hell.
Your definition of "words" is "whitespace delimited", which differ from regex's defitionition of "word to non-word", so use look arounds:
\s+\S{1,3}(?=\s)
Note that the expression includes (captures) leading spaces, so removing matches will not leave double spaces in the result.
When tested on regextester result is:
1.29 Cancels part plan C/5879 2030. e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands 53dv N014 3.5.98. PLAN from under R.5I B.L.1laY98 E35. Maroubrajuncti .I
sorry for my bad english. I have some params of ware in eshop like:
Mraznička
* Počet zásuvek mrazničky 3
* XXL zásuvka
* Mrazící výkon 4,5 kg/24 h
Rozměry balení:
Hmotnost (kg): 61.000
Výška (cm): 182.00
Šířka (cm): 64.00
Hloubka (cm): 71.00
Typ: volně stojící
Konstrukce chladničky: kombinovaná
Umístění mrazícího prostoru: mraznička dole
Změna otevírání dveří: ANO
Ovládání: mechanické-knoflíkové
Displej: bez displeje
Energetická třída: A++
There are three kind of block and I need to choose, which one is.
Conditions for types:
1) Text block begin with any letter, but NOT with * and NOT ending with :, this line must be followed by new line(s) beg. with *
2) Text block begin with any letter, but not with * and ending with :, this line must be followed by new line(s) NOT beg. with *
3) Line(or lines) begin with word(od word), then following char ":" and then following any othes word(or words)
Can you help me, how can I identify type of textblock? I need to check each textblock separately - parsing long text to block is allready done and works fine.
Thanks.
Added a possible solution for the 3 cases with a link to a online regex tester tool.
Each of these regex will only match one case of the block types.
As a precondition I assumed that the blocks are always separated by empty lines.
Edit
Minor updated regex inspired by comment (that was posted as separate answer) case 2 and 3 can overlap thus the regex now force empty line before each block.
1) http://www.myregextester.com/?r=df2be635
^[\r\n]{1,2}(?:[^*].+[^:][\r\n]{1,2})(?:\*.+[\r\n]{1,2})+$
2) http://www.myregextester.com/?r=f903ae6d
^[\r\n]{1,2}(?:[^*].+:[\r\n]{1,2})(?:[^*\s].+[\r\n]{1,2})+$
3) http://www.myregextester.com/?r=17ed0af8
^[\r\n]{1,2}(?:[^*].+:.+[^:][\r\n]{1,2})(?:[^*\s].+[\r\n]{1,2})+$
For all three cases the result will be captured in matcher group [0]. The regex is composed of two non capturing groups for the first line and the following repeated list.
I have a problem with regex, using preg_match_all(), to match something of a variable length.
What I am trying to match is the traffic condition after the word 'Congestion' What I came up with is this regex pattern:
Congestion\s*:\s*(?P<congestion>.*)
It would however, extract the first instance all the way to the end of the entire subject, since .* would match everything. But that's not what I want though, I would like it to match separately as 3 instances.
Now since the words behind Congestion could be of variable length, I can't really predict how many words and spaces are in between to come up with a stricter \w*\s*\w* match etc.
Any clues on how I can proceed from here?
Highway : Highway 26
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow from Smith St to Alice Springs St
Highway : Princes Highway
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow at the Flinders St / Elizabeth St intersection
Highway : Eastern Freeway
Datetime : 18-Oct-2010 05:19 PM
Congestion : Traffic is slow from Prince St to Queen St
EDIT FOR CLARITY
These very nicely formatted texts here, are actually received via a very poorly formatted html email. It contains random line breaks here and there eg "Congestion : Traffic\n is slow from Prince\nSt to Queen St".
So while processing the emails, I stripped off all the html codes and the random line breaks, and json_encode() them into one very long single-line string with no line break...
Usually, regex matching is line-based. Regex assumes that your string is a single line. You can use the “m” (PCRE_MULTILINE) flag to change that behaviour. Then you can tell PHP to match only to the end of the line:
preg_match('/^Congestion\s*:\s*(?P<congestion>.*)$/m', $subject, $matches);
There are two things to notice: first, the pattern was modified to include line-begin (^) and line-end ($) markers. Secondly, the pattern now carries the m modifier.
You can try a minimal match:
Congestion\s*:\s*(?P<congestion>.*?)
This would result in returning zero characters in the named group 'congestion' unless you could match something immediately after the congestion string.
So, this could be fixed if "Highway" always starts the traffic condition records:
Congestion\s*:\s*(?P<congestion>.*?)Highway\s*:
If this works (I have not checked it), then the first records are matched but the last record is not! This could be easily fixed by appending the text 'Highway :' at the end of the input string.
Congestion\s*:\s*Traffic is\s*(?P<c1>[^\n]*)\s*from\s*(?P<c2>[^\n]*)\s*to\s*(?P<c3>[^\n]*)$