Text Processing with PHP - php

Stackoverflow: I need your help!
I've been tasked with turning some (fairly) complex work diagrams for railway staff extracted from a Word document into something more usable for further processing, such as into a PHP array.
Here is a sample of one of the work diagrams:
LTP BH 4000
( Link 5)
DVR Su
On 00.22 PASS Barnham 00+34 5H97
Off 08.03 Lham 00+42
Hrs 7:41 PPTC Lham (06+24) 5N08
Traction for the above Service is
Days Su class 377
From 18/05/2014 377 PC Lham 01+46 5S62 DOO
To 24/08/2014 (Via CET)
TC Lham O Sh 01+50
PNB
377 PC Lham O Sh 03+10 5W62 DOO
(Via CWM)
DTCS Lham 03+32
377 PP Lham Shed 04+10 5W00 DOO
(Via CWM)
DTCS Lham Shed 04+24
PPTC Lham Shed (07+39) 5E24
Traction for the above Service is
class 377
PPTC Lham (06+37) 5H92
Traction for the above service is
class 377
377 PP Lham Shed 05+45 5W01 DOO
(Via CET)
377 Lham O Sh 05+57 06+28 5W01 DOO
(Via CWM)
TC Lham Shed 06+42
PPTC Lham Shed (09+58) 5H67
Traction for the above Service is
class 377
PPTC Lham Shed (07+41) 5P29 RP MO
Traction for the above Service is
class 377
(Unit forms part of 22+17
attachment)
PASS Lham 07.54 2P31
(To Bognor Regis)
Barnham 08.02
Routes 919
I've managed to process some of the data using simple regular expressions, but where I am struggling is the "middle" data which actually shows the work to be done. I am struggling because there is no real structure that defines what each line should look like, you will notice that many lines are different with some even including free text notes.
What I am looking to accomplish is to turn each row into an array that looks like the following:
$row = array("stock", "activity", "location", "departure_time", "arrival_time", "train_id", "notes");
The difficulty comes as not every line fits into this format - some lines have every "column", whereas others have one or more columns missing and other lines consist of free text.
I am by no means a text processing expert, but I cannot seem to find a solution to this problem. I'm not after a complete solution, just some pointers would be gratefully received!
Update Just for clarification, I'm not interested in the free text rows. The data they contain is not important for what I am trying to accomplish.

I'll refine this answer more as soon as more data comes in, but in the meantime I'd go with what amounts to a state machine.
You read the text one line after the other. Initially you are in the "WAITING FOR DIAGRAM" state:
$status = array(
'file' => $fp,
'manager' => 'waitForDiagram',
);
$chunk = 0;
$lineno = 0;
$manage = $status['manager'];
while (!feof($fp)) {
$line = fgets($fp, 1024); // is 1 Kb enough? Maybe not.
$lineno ++;
$manage($status, $line);
if ($status['manager'] != $manage)) {
$chunk = 0;
if (!function_exists($status['manager'])) {
trigger_error("{$manage}({$line}) -> {$status['manager']}: no such state");
}
$manage = $status['manager'];
}
if (++$chunk > ALERT) {
trigger_error("Stuck in state {$manage} since {$chunk} lines!", E_USER_ERROR);
}
}
Then you define a function for each state, beginning with the first:
function waitForDiagram(&$status, $line) {
// Part common to most such state functions:
$tokens = tokenise($line);
// Quickly check whether anything needs doing.
if (!in_array($token[0], [ "LTP" ]) {
// if not, return.
return;
}
$status['diagram'] = array(
'diagram' => array(
'title' => $token[0],
'whatever' => $token[1],
'comment' => '',
)
);
...
// In this case, all information is only in one line, so we can
// continue to the next state, which in this case is always waitForOnAndGetComments.
$status['manager'] = 'waitForOnAndGetComments';
}
function waitForOnAndGetComments(&$status, $line) {
$tokens = tokenise($line);
// If we get "On" it's the line, otherwise it is still the comment
if (!in_array($token[0], [ "On" ]) {
$status['diagram']['comments'] .= $line;
return;
}
// Otherwise we have On 00.22 PASS Barnham 00+34
// and always a next line.
$offTok = tokenise(fgets($status['fp'], 1024));
if ($offTok['0'] != "Off") {
trigger_error("Found ON, but next row is not OFF, what gives?", E_USER_ERROR);
}
$status['diagram']['on'] = array(
'time' => $tokens[1],
...
);
...
$status['diagram']['off'] = array(
'time' => $offTok[1],
'line' => $offTok[2],
...
);
$status['manager'] = 'waitForSomethingElse';
}
...and so on...
One important thing is how you tokenise the lines. If you have a clear delimiter (such as a tab) and can use explode, all well and good. Else you can try with preg_split('#\\s{2,}#'), using sequences of two or more whitespaces to separate "cells" in each "row".

I found what was causing me grief solving this. I'm loading the Word document using a tool called "antiword". Antiword seems to strip special characters such as tabs. However, I found that by passing the "-w 0" switch, these characters are preserved and parsing the diagrams using simple regular expressions became trivial. Many thanks to #Iserni for taking to time to help me, none the less.

Related

PHP Soap Client request answers with a single String

I'm a bit of an noob when it comes to SOAP client requests.
I'm hoping someone could help, I'm trying to make a soap client request to a website. I can make the request however the returned XML (which I'm turning into an Array) seems to come as a single string not separated into the XML elements.
The XML:
<ProductList>
<Product>
<ProductCode>00380</ProductCode>
<ProductName>Droopy Eye Specs</ProductName>
<BrochureDescription>Droopy Eye Specs, Black, with Metal Spring, on Display Card</BrochureDescription>
<WebDescription>Droopy Eye Specs, Black, with Metal Spring</WebDescription>
<WashingInstructions>Not Applicable</WashingInstructions>
<RRP>1.8900</RRP>
<StockQuantity>943</StockQuantity>
<VatRate>20.00</VatRate>
<Gender>UNISEX</Gender>
<PackType>on Display Card</PackType>
<PackQty>1</PackQty>
<Audience>ADULT</Audience>
<Colour>BLACK</Colour>
<ETA>2019-07-04 00:00:00.</ETA>
<CataloguePage>641</CataloguePage>
<BarCode>5020570003800</BarCode>
<Price1>0.91</Price1>
<Price2>0.00</Price2>
<Price3>0.00</Price3>
<Break1>1.00</Break1>
<Break2>0.00</Break2>
<Break3>0.00</Break3>
<unit_size>1</unit_size>
<warnings/>
<carton>120</carton>
<stdPrice1>0.91</stdPrice1>
<stdPrice2>0.00</stdPrice2>
<stdPrice3>0.00</stdPrice3>
<stdBreak1>12.00</stdBreak1>
<stdBreak2>0.00</stdBreak2>
<stdBreak3>0.00</stdBreak3>
<Photo>1</Photo>
<CatalogueCode>JN-01</CatalogueCode>
<CatalogueName>Jokes & Novelties_Assorted</CatalogueName>
<Catalogue/>
<acc_code1>32928</acc_code1>
<acc_code2>32929</acc_code2>
<alt_code1>20073</alt_code1>
<alt_code2>25202</alt_code2>
<alt_code3>29111</alt_code3>
<alt_code4>6155</alt_code4>
<alt_code5>98413</alt_code5>
<new_code/>
<art_cat/>
<ImageAvailability>No</ImageAvailability>
<Seasonal>No</Seasonal>
<p_list2/>
<Licence_Territory/>
<ThemeName>Funnyside Fancy Dress</ThemeName>
<GroupID>3</GroupID>
<GroupName>Adult Fancy Dress Costumes</GroupName>
<GroupID1>0</GroupID1>
<ThemeGroup1>Uncategorized</ThemeGroup1>
<GroupID2>0</GroupID2>
<ThemeGroup2>Uncategorized</ThemeGroup2>
<GroupID3>0</GroupID3>
<ThemeGroup3>Uncategorized</ThemeGroup3>
<EFPrice>0.9100</EFPrice>
<EFQty>1</EFQty>
<size>Not Applicable</size>
<Ext_Size>NOT APPLICABLE</Ext_Size>
<GenericCode>00380</GenericCode>
<HasImageRights>No</HasImageRights>
<Safety>Warning! Not suitable for children under 3 years due to small parts. Choking Hazard.</Safety>
<Composition/>
</Product>
<Product>
<ProductCode>00429</ProductCode>
<ProductName>Metal Handcuffs</ProductName>
<BrochureDescription>Metal Handcuffs, Silver, with Key, on Display Card</BrochureDescription>
<WebDescription>Metal Handcuffs, Silver, with Key</WebDescription>
<WashingInstructions>Not Applicable</WashingInstructions>
<RRP>3.0900</RRP>
<StockQuantity>4926</StockQuantity>
<VatRate>20.00</VatRate>
<Gender>UNISEX</Gender>
<PackType>on Display Card</PackType>
<PackQty>1</PackQty>
<Audience>ADULT</Audience>
<Colour>SILVER</Colour>
<ETA>2019-02-10 00:00:00.</ETA>
<CataloguePage>424</CataloguePage>
<BarCode>5020570004296</BarCode>
<Price1>1.50</Price1>
<Price2>0.00</Price2>
<Price3>0.00</Price3>
<Break1>1.00</Break1>
<Break2>0.00</Break2>
<Break3>0.00</Break3>
<unit_size>1</unit_size>
<warnings>FREIG, FREIG,</warnings>
<carton>96</carton>
<stdPrice1>1.50</stdPrice1>
<stdPrice2>0.00</stdPrice2>
<stdPrice3>0.00</stdPrice3>
<stdBreak1>3.00</stdBreak1>
<stdBreak2>0.00</stdBreak2>
<stdBreak3>0.00</stdBreak3>
<Photo>1</Photo>
<CatalogueCode>AC-30</CatalogueCode>
<CatalogueName>Accessories_Truncheons & Handcuffs</CatalogueName>
<Catalogue/>
<acc_code1>29535</acc_code1>
<acc_code2>33723</acc_code2>
<acc_code3>96318</acc_code3>
<alt_code1>23076</alt_code1>
<alt_code2>23918</alt_code2>
<alt_code3>30652</alt_code3>
<alt_code4>34757</alt_code4>
<alt_code5>374</alt_code5>
<new_code/>
<art_cat/>
<ImageAvailability>No</ImageAvailability>
<Seasonal>No</Seasonal>
<p_list2/>
<Licence_Territory/>
<ThemeName>Cops & Robbers Fancy Dress</ThemeName>
<GroupID>3</GroupID>
<GroupName>Adult Fancy Dress Costumes</GroupName>
<GroupID1>0</GroupID1>
<ThemeGroup1>Uncategorized</ThemeGroup1>
<GroupID2>0</GroupID2>
<ThemeGroup2>Uncategorized</ThemeGroup2>
<GroupID3>0</GroupID3>
<ThemeGroup3>Uncategorized</ThemeGroup3>
<EFPrice>1.5000</EFPrice>
<EFQty>1</EFQty>
<size>Not Applicable</size>
<Ext_Size>NOT APPLICABLE</Ext_Size>
<GenericCode>00429</GenericCode>
<HasImageRights>No</HasImageRights>
<Safety>Warning! Not suitable for children under 3 years due to small parts - Choking Hazard. Keep these details for reference. Warning! Do not over tighten as this may cause the safety catch to jam. INSTRUCTIONS: 1. LOCK Move stop bar to upper position, press cuff down on wrist and rotate the jaw until it engages ratchet. Jaw may be tightened as required. Do not over tighten. Move stop bar to down position, jaw is thus locked against travel in either direction. 2. UNLOCK Move stop bar to open</Safety>
<Composition/>
</Product>
My PHP:
$apiKey = '00000';
$clientID = 'MyID';
$LanguageCode = 'EN';
$wdsl = 'http://webservices.website.com/services/products.asmx?WSDL';
$params = array('apiKey' => $apiKey, 'clientID' => $clientID);
$soapclient = new SoapClient($wdsl);
$response = $soapclient->GetFullDataSet($params);
$array = json_decode(json_encode($response), true);
print_r ($array);
the Returned Array :
Array ( [GetFullDataSetResult] => Array ( [any] => 00380Droopy Eye SpecsDroopy Eye Specs, Black, with Metal Spring, on Display CardDroopy Eye Specs, Black, with Metal SpringNot Applicable1.890094320.00UNISEXon Display Card1ADULTBLACK2019-07-04 00:00:00.64150205700038000.910.000.001.000.000.0011200.910.000.0012.000.000.001JN-01Jokes & Novelties_Assorted3292832929200732520229111615598413NoNoFunnyside Fancy Dress3Adult Fancy Dress Costumes0Uncategorized0Uncategorized0Uncategorized0.91001Not ApplicableNOT APPLICABLE00380NoWarning! Not suitable for children under 3 years due to small parts. Choking Hazard.00429Metal HandcuffsMetal Handcuffs, Silver, with Key, on Display CardMetal Handcuffs, Silver, with KeyNot Applicable3.0900492620.00UNISEXon Display Card1ADULTSILVER2019-02-10 00:00:00.42450205700042961.500.000.001.000.000.001FREIG, FREIG, 961.500.000.003.000.000.001AC-30Accessories_Truncheons & Handcuffs29535337239631823076239183065234757374NoNoCops & Robbers Fancy Dress3Adult Fancy Dress Costumes0Uncategorized0Uncategorized0Uncategorized1.50001Not ApplicableNOT APPLICABLE00429NoWarning! Not suitable for children under 3 years due to small parts - Choking Hazard. Keep these details for reference. Warning! Do not over tighten as this may cause the safety catch to jam. INSTRUCTIONS: 1. LOCK Move stop bar to upper position, press cuff down on wrist and rotate the jaw until it engages ratchet. Jaw may be tightened as required. Do not over tighten. Move stop bar to down position, jaw is thus locked against travel in either direction. 2. UNLOCK Move stop bar to open
How do I get the XML elements in to the Array as Each Element?
Eg. Product > Productcode > ProductName > BrochureDescription > Etc...
Let me know if there is any info I've missed out. Any Help would be appreciate.
Many thanks.

Random Content Array seems stuck

I have a random content script that has worked perfectly but now seems to have a glitch.
It's the "Spotlight On:" story on the upper lefthand corner at http://fiction.deslea.com/index2.php and the code is as follows:
$storyspotlights = array("bluevial", "biophilia", "real", "edgeofreality",
"limitsofperception", "markofcain", "spokenfor", "closer",
"feildelm", "purgatory", "elemental");
$randomstoryID = array_rand($storyspotlights);
$randomstory = $storyspotlights[$randomstoryID];
switch ($randomstory) {
case ($randomstory == 'closer'):
$storyspotlightheader = "<div class='storyspotlightheader'>Closer</div>";
$storyspotlighttext = "snip";
//some stories snipped
case ($randomstory == 'bluevial'):
$storyspotlightheader = "<div class='storyspotlightheader'>The Blue
Vial</div>";
$storyspotlighttext = "snip";
break;
//more stories snipped
}
print($storyspotlightheader);
print($storyspotlighttext);
My problem is - all the stories from Blue Vial to Spoken For appear when you refresh the page, in random order (although Blue Vial seems to stick a fair bit). These were the stories in the script originally.
Since then I have added the last four to the array and the content generation switch case fragment, but these last four stories never, ever appear in the randomiser. I've literally sat and refreshed for hours. I've confirmed over and over that the updated script is on the server, and even deleted and re-uploaded it.
I did try unset and also $storyspotlights = array() at the beginning of the script at various stages of troubleshooting, but to no avail. I also tried moving the new stories to the start of the array - no change there either.
What am I missing?
It's surprising this works at all. That's not how you use switch..case.
switch (<value to compare>) {
case <value to compare against>:
...
}
That means you write this:
switch ($randomstory) {
case 'closer':
...
}
With what you've written it's actually executing like:
if ($randomstory == ($randomstory == 'closer')) ...
Also make sure you have not actually forgotten some break statements, which would make the code fall through to the next case and indeed make certain cases "more sticky" than others.
Also, I'd simplify the whole thing to this:
$stories = array(
array('header' => '...', 'text' => '...'),
array('header' => '...', 'text' => '...'),
...
);
$story = $stories[array_rand($stories)];
echo $story['header'];
echo $story['text'];

Evaluate logic expression given as a string in PHP

I have an object which has a state property, for example state = 'state4' or state = 'state2'.
Now I also have an array of all available states that the state property can get, state1 to state8 (note: the states are not named stateN. They have eight different names, like payment or canceled. I just put stateN to describe the problem).
In addition to that, I have a logical expression like $expression = !state1||state4&&(!state2||state5) for example. This is the code for the above description:
$state = 'state4';
$expression = '!state1||state4&&(!state2||state5)';
Now I want to check if the logical expression is true or false. In the above case, it's true. In the following case it would be false:
$state = 'state1';
$expression = state4&&!state2||(!state1||state7);
How could this be solved in an elegant way?
//Initialize
$state = 'state4';
$expression = '!state1||state4&&(!state2||state5)';
//Adapt to your needs
$pattern='/state\d/';
//Replace
$e=str_replace($state,'true',$expression);
while (preg_match_all($pattern,$e,$matches)
$e=str_replace($matches[0],'false',$e);
//Eval
eval("\$result=$e;");
echo $result;
Edit:
Your update to the OQ necessitates some minor work:
//Initialize
$state = 'payed';
$expression = '!payed||cancelled&&(!whatever||shipped)';
//Adapt to your needs
$possiblestates=array(
'payed',
'cancelled',
'shipped',
'whatever'
);
//Replace
$e=str_replace($state,'true',$expression);
$e=str_replace($possiblestates,'false',$e);
//Eval
eval("\$result=$e;");
echo $result;
Edit 2
There has been concern about eval and PHP injection in the comments: The expression and the replacements are completly controlled by the application, no user input involved. As long as this holds, eval is safe.
I am using ExpressionLanguage, but there are few different solutions
ExpressionLanguage Symfony Component - https://symfony.com/doc/current/components/expression_language.html
cons - weird array syntax - array['key']. array.key works only for objects
cons - generate notice for array['key'] when key is not defined
pros - stable and well maintainer
https://github.com/mossadal/math-parser
https://github.com/optimistex/math-expression
Please remember that eval is NOT an option, under NO circumstances. We don't live an a static world. Any software always grows and evolves. What was once considered a safe input an one point may turn completely unsafe and uncontrolled.
I think you have a case which can be solved if you model each of your expressions as a rooted directed acyclic graph (DAG).
I assumed acyclic since your ultimate aim is to find the result of boolean algebra operations (if cycling occur in any of graph, then it'd be nonsense I think).
However, even if your graph structure—meaningfully—can be cyclic then your target search term will be cyclic graph, and it should still have a solution.
$expression = '!state1||state4&&(!state2||state5)';
And you have one root with two sub_DAGs in your example.
EXPRESSION as a Rooted DAG:
EXPRESSION
|
AND
___/ \___
OR OR
/ \ / \
! S_1 S_4 ! S_2 S5
Your adjacency list is:
expression_adj_list = [
expression => [ subExp_1, subExp_2 ] ,
subExp_1 => [ ! S_1, S_4 ],
subExp_2 => [ ! S_2, S5 ]
]
Now you can walk through this graph by BFS (breadth-first search algorithm) or DFS (depth-first search algorithm) or your custom, adjusted algorithm.
Of course you can just visit the adjacency list with keys and values as many times as you need if this suits and is easier for you.
You'll need a lookup table to teach your algorithm that. For example,
S2 && S5 = 1,
S1 or S4 = 0,
S3 && S7 = -1 (to throw an exception maybe)
After all, the algorithm below can solve your expression's result.
$adj_list = convert_expression_to_adj_list();
// can also be assigned by a function.
// root is the only node which has no incoming-edge in $adj_list.
$root = 'EXPRESSION';
q[] = $root; //queue to have expression & subexpressions
$results = [];
while ( ! empty(q)) {
$current = array_shift($q);
if ( ! in_array($current, $results)) {
if (isset($adj_list[$current])) { // if has children (sub/expression)
$children = $adj_list[$current];
// true if all children are states. false if any child is subexpression.
$bool = is_calculateable($children);
if ($bool) {
$results[$current] = calc($children);
}
else {
array_unshift($q, $current);
}
foreach ($children as $child) {
if (is_subexpresssion($child) && ! in_array($child, $results)) {
array_unshift($q, $child);
}
}
}
}
}
return $results[$root];
This approach has a great advantage also: if you save the results of the expressions in your database, if an expression is a child of the root expression then you won't need to recalculate it, just use the result from the database for the child subexpressions. In this way, you always have a two-level depth DAG (root and its children).

Synonym finder algorithm

I think example will be much better than loooong description :)
Let's assume we have an array of arrays:
("Server1", "Server_1", "Main Server", "192.168.0.3")
("Server_1", "VIP Server", "Main Server")
("Server_2", "192.168.0.4")
("192.168.0.3", "192.168.0.5")
("Server_2", "Backup")
Each line contains strings which are synonyms. And as a result of processing of this array I want to get this:
("Server1", "Server_1", "Main Server", "192.168.0.3", "VIP Server", "192.168.0.5")
("Server_2", "192.168.0.4", "Backup")
So I think I need a kind of recursive algorithm. Programming language actually doesn't matter — I need only a little help with idea in general. I'm going to use php or python.
Thank you!
This problem can be reduced to a problem in graph theory where you find all groups of connected nodes in a graph.
An efficient way to solve this problem is doing a "flood fill" algorithm, which is essentially a recursive breath first search. This wikipedia entry describes the flood fill algorithm and how it applies to solving the problem of finding connected regions of a graph.
To see how the original question can be made into a question on graphs: make each entry (e.g. "Server1", "Server_1", etc.) a node on a graph. Connect nodes with edges if and only if they are synonyms. A matrix data structure is particularly appropriate for keeping track of the edges, provided you have enough memory. Otherwise a sparse data structure like a map will work, especially since the number of synonyms will likely be limited.
Server1 is Node #0
Server_1 is Node #1
Server_2 is Node #2
Then edge[0][1] = edge[1][0] = 1, indicated that there is an edge between nodes #0 and #1 ( which means that they are synonyms ). While edge[0][2] = edge[2][0] = 0, indicating that Server1 and Server_2 are not synonyms.
Complexity Analysis
Creating this data structure is pretty efficient because a single linear pass with a lookup of the mapping of strings to node numbers is enough to crate it. If you store the mapping of strings to node numbers in a dictionary then this would be a O(n log n) step.
Doing the flood fill is O(n), you only visit each node in the graph once. So, the algorithm in all is O(n log n).
Introduce integer marking, which indicates synonym groups. On start one marks all words with different marks from 1 to N.
Then search trough your collection and if you find two words with indexes i and j are synonym, then remark all of words with marking i and j with lesser number of both. After N iteration you get all groups of synonyms.
It is some dirty and not throughly efficient solution, I believe one can get more performance with union-find structures.
Edit: This probably is NOT the most efficient way of solving your problem. If you are interested in max performance (e.g., if you have millions of values), you might be interested in writing more complex algorithm.
PHP, seems to be working (at least with data from given example):
$data = array(
array("Server1", "Server_1", "Main Server", "192.168.0.3"),
array("Server_1", "VIP Server", "Main Server"),
array("Server_2", "192.168.0.4"),
array("192.168.0.3", "192.168.0.5"),
array("Server_2", "Backup"),
);
do {
$foundSynonyms = false;
foreach ( $data as $firstKey => $firstValue ) {
foreach ( $data as $secondKey => $secondValue ) {
if ( $firstKey === $secondKey ) {
continue;
}
if ( array_intersect($firstValue, $secondValue) ) {
$data[$firstKey] = array_unique(array_merge($firstValue, $secondValue));
unset($data[$secondKey]);
$foundSynonyms = true;
break 2; // outer foreach
}
}
}
} while ( $foundSynonyms );
print_r($data);
Output:
Array
(
[0] => Array
(
[0] => Server1
[1] => Server_1
[2] => Main Server
[3] => 192.168.0.3
[4] => VIP Server
[6] => 192.168.0.5
)
[2] => Array
(
[0] => Server_2
[1] => 192.168.0.4
[3] => Backup
)
)
This would yield lower complexity then the PHP example (Python 3):
a = [set(("Server1", "Server_1", "Main Server", "192.168.0.3")),
set(("Server_1", "VIP Server", "Main Server")),
set(("Server_2", "192.168.0.4")),
set(("192.168.0.3", "192.168.0.5")),
set(("Server_2", "Backup"))]
b = {}
c = set()
for s in a:
full_s = s.copy()
for d in s:
if b.get(d):
full_s.update(b[d])
for d in full_s:
b[d] = full_s
c.add(frozenset(full_s))
for k,v in b.items():
fsv = frozenset(v)
if fsv in c:
print(list(fsv))
c.remove(fsv)
I was looking for a solution in python, so I came up with this solution. If you are willing to use python data structures like sets
you can use this solution too. "It's so simple a cave man can use it."
Simply this is the logic behind it.
foreach set_of_values in value_collection:
alreadyInSynonymSet = false
foreach synonym_set in synonym_collection:
if set_of_values in synonym_set:
alreadyInSynonymSet = true
synonym_set = synonym_set.union(set_of_values)
if not alreadyInSynonymSet:
synonym_collection.append(set(set_of_values))
vals = (
("Server1", "Server_1", "Main Server", "192.168.0.3"),
("Server_1", "VIP Server", "Main Server"),
("Server_2", "192.168.0.4"),
("192.168.0.3", "192.168.0.5"),
("Server_2", "Backup"),
)
value_sets = (set(value_tup) for value_tup in vals)
synonym_collection = []
for value_set in value_sets:
isConnected = False # If connected to a term in the graph
print(f'\nCurrent Value Set: {value_set}')
for synonyms in synonym_collection:
# IF two sets are disjoint, they don't have common elements
if not set(synonyms).isdisjoint(value_set):
isConnected = True
synonyms |= value_set # Appending elements of new value_set to synonymous set
break
# If it's not related to any other term, create a new set
if not isConnected:
print ('Value set not in graph, adding to graph...')
synonym_collection.append(value_set)
print('\nDone, Completed Graphing Synonyms')
print(synonym_collection)
This will have a result of
Current Value Set: {'Server1', 'Main Server', '192.168.0.3', 'Server_1'}
Value set not in graph, adding to graph...
Current Value Set: {'VIP Server', 'Main Server', 'Server_1'}
Current Value Set: {'192.168.0.4', 'Server_2'}
Value set not in graph, adding to graph...
Current Value Set: {'192.168.0.3', '192.168.0.5'}
Current Value Set: {'Server_2', 'Backup'}
Done, Completed Graphing Synonyms
[{'VIP Server', 'Main Server', '192.168.0.3', '192.168.0.5', 'Server1', 'Server_1'}, {'192.168.0.4', 'Server_2', 'Backup'}]

php function preg_replace regex not working, a syntax question

im trying to remove unnessary comments with preg-replace in controlled script situations, but my regex is incorrect. Anyone any ideas whats wrong with my regex? (i have Apache/2.0.54 & PHP/5.2.9
BEFORE:
// Bla Bli Blue Blow Bell Billy Bow Bye
script var etc (); // cangaroo cognac codified cilly celine cocktail couplet
script http://blaa.org // you get the idea!
AFTER:
script var etc ();
script http://blaa.org
PROBLEM: what regex to use?
# when comment starts on a new line, delete this entire line
# find [a new line] [//] [space or no space] [comment]
$buffer = preg_replace('??', '??', $buffer);
# when comment is halfway in script ( // comment)
# find [not beginning of a line] [1 TAB] [//] [1 space again] [comment]
$buffer = preg_replace('??', '??', $buffer);
Any and All suggestions will be valued +1 by me, cuase im so darn close to solve this riddle!
Try this regex:
/(?<!http:)\/\/[^\r\n]*/
Be cautious though, consider strings like:
<!--
// not a comment -->
or
/*
// not a comment */
and
var s = "also // not // a // comment";
And you might want to work around https://... and ftp://... etc.

Categories