Cypher and the outer join

I have been experimenting, on and off, with Neo4j, a graph database. One way to get data out of the DB is to use Cypher, "a declarative graph query language".

This post is about one Cypher query ...

I want to return a list of people 'known' by a person, however, in the returned data I want to indicate who, in the list, is also known by another person.

The data looks something like this:
personA -[:KNOWS]-> personX
personA -[:KNOWS]-> personY
personA -[:KNOWS]-> personZ

personB -[:KNOWS]-> personX
personB -[:KNOWS]-> personW

This pseudo-Cypher gives me who personA AND personB know:

view plain print about
1START         primary = node(personA), secondary = node(personB)
2MATCH        primary-[:KNOWS]->
known<-[:KNOWS]-secondary
3RETURN        known

I'd get 'personX'.

But, and it's a big butt, I want to get all the people personA knows and some sort of indication that personB also knows personX which, in SQL terms, is an outer join. I see in the docs - http://docs.neo4j.org/chunked/milestone/query-sql-match.html.

So a '?' signifies an outer join. So I have to put the '?' on the relationship between the 'secondary' node and the 'known' node. Where there is no relationship it will return a NULL. I use the count() function since I don't really want to deal with nulls.

This is what I ended up with:

view plain print about
1START         primary = node(personA), secondary = node(personB)
2MATCH        primary-[:KNOWS]->
known<-[r?:KNOWS]-secondary
3RETURN        known.fullname, count(r) AS incommon

Roman numeral converter in ColdFusion

So apparently this is one of those things they ask in interviews. Convert a roman numeral into an integer (or apparently, 'natural number' is more precise).

And here is my effort (admittedly it'll fall over on non-roman numerals, but you get the picture) ...

NOTE: be careful, on line 22 it seems the code formatter has added an extra 'n' to len, I recommend using the 'view plain'.

view plain print about
1<cfset rn = "MMXII">
2
3<cfoutput>
4    #roman2Decimal(rn)#<br />
5</cfoutput>
6
7<cffunction name="roman2Decimal" hint="Convert a roman number to its natural number equivalent">
8    <cfargument name="m" hint="The roman number to convert.">    
9    
10    <!--- Set up a struct on which Roman numerals are based --->
11    <cfset var r = {"I"=1, "V"=5, "X"=10, "L"=50, "C"=100, "D"=500, "M"=1000}>
12    
13    <!--- This is where the magic happens.
14            Take the Roman numeral and figure out what to do with it add or subtract one from the other.
15            Recurse through the 'string'. --->

16    <cfset number = len(arguments.m) eq 0
17                            ? 0
18                            : len(arguments.m) eq 1
19                                        ? r[arguments.m]
20                                        : (r[mid(arguments.m,1,1)] lt r[mid(arguments.m,2,1)])
21                                                        ? (r[mid(arguments.m,2,1)] - r[mid(arguments.m,1,1)]) + roman2Decimal( len(arguments.m) gt 2
22                                                                                                    ? right(arguments.m, lenn(arguments.m)-2))
23                                                                                                    : "" )
24                                                        : r[mid(arguments.m,1,1)] + roman2Decimal(mid(arguments.m,2,len(arguments.m)))
>

25    <cfreturn number>
26</cffunction>

Alternative to using a series of StructKeyExists

So, occasionally I find myself writing cfif statements with a series of StructKeyExists when I have nested Structs. Something like:

view plain print about
1<cfif StructKeyExists(str, "bob")
2        AND StructKeyExists(str.bob, "fred")
3            AND StructKeyExists(str.bob.fred, "jane")>

4    Now do something.
5</cfif>

So I thought about writing a function that takes the struct and the path to the key that I wanted to see exists and this is what I came up with:

view plain print about
1<!--- Set up a Struct of structs of structs of structs ... you get the picture. --->
2<cfset str = {}>
3<cfset str["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"]["bob"]["fred"]["jane"] = {}>
4
5<cfset tick = getTickCount()>
6<cfoutput>
7    #StructDeepExists(str, "bob.fred.jane.bob.fred.jane.bob.fred.jane.bob.fred.jane.bob.fred.jane.bob.fred.jane.bob.fred.jane.bob.fred.jane")#<br />
8    #getTickCount() - tick#
9</cfoutput>
10
11<!--- Now a UDF that will recurse through the mega deep structure and determine if the key you're after does indeed exist --->
12<cffunction name="StructDeepExists" hint="Taking a deep struct and determine if a key exists.">
13    <cfargument name="structure" type="struct">
14    <cfargument name="structurePath">
15    
16    <!--- Now do the StructKeyExists through the list of struct keys --->
17    <cfset var exists = StructKeyExists(structure, ListFirst(structurePath, "."))
18                                    ? ListLen(structurePath, '.') gt 1
19                                        ? StructDeepExists(structure[ListFirst(structurePath, ".")], ListRest(structurePath, '.'))
20                                             : true
21                                     : false
>

22    <cfreturn exists>
23</cffunction>

ColdFusion; extracting substrings and multibyte chars

We have a service that returns xml. The xml is not how you'd expect i.e. where a parent node represents a record and the child nodes represent fields relevant to the current record. So to be clear, it isn't like this:

view plain print about
1<links>
2    <linkData>
3        <link>mylinkhere</link>
4        <linkText>mylink text here</linkText>
5    </linkData>
6    <linkData>
7        <link>myotherlinkhere</link>
8        <linkText>myotherlink text here</linkText>
9    </linkData>
10...etc
11</links>

Instead it concatenates the contents of fields so it would concatenate the two "link" fields in one node and concatentate the two "linkText" nodes in another node. Then it would add a "length" node to show how long each entry would be - it ends up looking looking like this:

view plain print about
1<linkdata>
2    <link>
3        <length>9,15</length>
4        <text>mylinkheremyotherlinkhere</text>
5    </link>
6    <linkText>
7        <length>16,21</length>
8        <text>mylink text heremyotherlink text here</text>
9    </linkText>
10</linkdata>

I assumed the the length represented the character count and based on that assumption I had used the ColdFusion Mid() function.

It turns out the length actually represented the byte length, so if there were any multibyte characters then my mid function would extract too much.

Here is a code sample using the first technique (I have represented the xml as a struct):

view plain print about
1<cfprocessingdirective pageencoding="utf-8">
2
3<cfset str = {}>
4<cfset str["links"] = {lengths="53,28,25",text="http://my.url.com/with multibyte chars _???.pdfhttp://my.url.com/normal.pdfhttp://my.url.com/?.pdf",colour="red"}>
5<cfset str["linkText"] = {lengths="54,22,22",text="A link to my first multibyte url look it has ???A link to a normal urlOne more link with ?",colour="blue"}>
6
7<cfdump var="#str#">
8
9<cfset numberOfRecords = 3>
10
11<!--- Show the lengths of the urls and linkText strings. --->
12<cfoutput>
13    <p><strong>Extract the url strings and the link text strings assuming each character is a single byte character</strong></p>
14    <p><cfloop collection="#str#" item="elem">
15        <cfset positionStart = 1>
16        <strong>#elem#</strong><br />
17        (length of string = #len(str[elem].text)#)<br />
18        (sum of reported lengths = #ArraySum(ListToArray(str[elem].lengths))#)<br />
19        <cfloop from="1" to="3" index="i">
20            <cfset u = mid(str[elem].text, positionStart, ListGetAt(str[elem].lengths,i))>
21            <cfset positionStart = positionStart + ListGetAt(str[elem].lengths,i)>
22            <span style="color:#str[elem].colour#;">#u#</span><br />
23        </cfloop>
24    </cfloop></p>
25</cfoutput>

The question marks '?' are supposed to represent multibyte chars, unfortunately something on this environment means it is not retaining these chars.

Anyway I found that converting the string to bytes using getBytes("UTF-8") then using Java's "java.util.Arrays" and the "copyOfRange()" method meant I could extract the bytes I required and the using CF's CharsetEncode("UTF-8") convert the bytes back into a string. See below for the revised version.

view plain print about
1<!--- Show the lengths of the b_urls and b_linkText arrays. --->
2<cfoutput>
3    <p><strong>Extract the url strings and the link text strings</strong></p>
4    <cfset arr = CreateObject("java","java.util.Arrays")>
5    <p><cfloop collection="#str#" item="elem">
6        <cfset positionStart = 0>
7        <cfset positionSum = 0>
8        <strong>#elem#</strong><br />
9        <cfset b = str[elem].text.getBytes("utf-8")>
10        (length of byte array = #len(b)#)<br />
11        (sum of reported lengths = #ArraySum(ListToArray(str[elem].lengths))#)<br />
12        <cfloop from="1" to="3" index="i">
13            <cfset positionStart = positionSum>
14            <cfset positionSum = positionSum + ListGetAt(str[elem].lengths,i)>
15            <cfset ua = arr.copyOfRange(b, positionStart, positionSum)>
16            <cfset u = CharsetEncode(ua,'utf-8')>
17        <span style="color:#str[elem].colour#;">#u#</span><br />
18        </cfloop>
19    </cfloop></p>
20</cfoutput>

Here is a link to the sample.

ColdFusion, .Net (dotnet) and byte arrays

Problem: Using ColdFusion's integration with .Net you can use 'System.IO.File' to read in a file and get back the byte array of the file, however it is not a a byte array that you'd expect i.e. a binary object. It is actually translated to an CF Array. This makes it difficult to use CF to deal with .Net byte arrays.

Solution: However you can use .Net's 'System.Convert' to convert it to a Base64 string, something CF can deal with.

See code below:

view plain print about
1<cfsetting showdebugoutput="false">
2
3<!--- The file to read --->
4<cfset FilePath = "C:\Projects\...\AnImage.jpg">
5
6<!--- Read the file using .Net, returns a byte array. --->
7<cfset objFile = CreateObject("dotnet","System.IO.File")>
8<cfset binaryFileDN = objFile.ReadAllBytes(FilePath)>
9
10<!--- Read[binary] the same file using cffile. --->
11<cffile action="readbinary" file="#FilePath#" variable="binaryFileCF">
12
13<!--- Show the 'lengths' of both objects. --->
14<cfoutput>
15CF binary length: #ArrayLen(binaryFileCF)#<br />
16.Net binary length: #ArrayLen(binaryFileDN)#<br /><br />
17</cfoutput>
18
19<!--- Show the dumps of each object. --->
20<cfdump var="#binaryFileCF#" label="CF"><br />
21<cfdump var="#binaryFileDN#" top="10" label=".Net"><br />
22
23<!--- Take the .Net byte array and convert it to a Base64 string. --->
24<cfset Convert = CreateObject("dotnet","System.Convert")>
25<cfset BodyBase64 = Convert.ToBase64String(binaryFileDN)>
26<cfset newBinaryFileDN = BinaryDecode(BodyBase64, "base64")>
27
28<cfdump var="#newBinaryFileDN#" label=".Net">

The result looks something like this:

Of course this is not limited to just reading binary files, anything that .Net returns as a byte array (e.g. 'System.Net.WebClient' DownloadData()) and consequently gets translated by Java/ColdFusion as an 'Array' can be treated the same way.

ColdFusion, .Net (dotnet) and NTLM

Problem: Trying to make an HTTP request against a web site that has Intergated Windows Authentication (IWA) aka NTLM - sometimes CFHTTP just doesn't cut it.

[More]

Convert epoch date to human readable date

I have a function that takes two arguments an epoch date and a date mask and it returns the formatted date.

The line that does all the work is:

view plain print about
1<cfset sFormattedDate = DateFormat( DateAdd("s",left(arguments.epoch,10),DateConvert("utc2Local", "January 1 1970 00:00")), arguments.mask ) >

This works ... for epoch dates less than 19 January 2038.

My application recently started throwing an error when trying to convert 2177456400 to a date. So I needed a more robust way of converting an epoch date to an actual date.

view plain print about
1<cfset dateFormatter = CreateObject("java","java.text.SimpleDateFormat").init(arguments.mask)>
2<cfset date = CreateObject( "java", "java.util.Date" ).init(arguments.epoc*1000)>
3<cfset sFormattedDate = dateFormatter.format(date)>

2177456400 is 01 Jan 2039.

Google getting into the wedding spirit

Just seen that Google have added Mr and Mrs Pegman to Google StreetView for Wills and Kate.

SQL to remove duplicates, keep most recent record

First off, I am not a DBA, although as a developer I have to write a lot of SQL. I have found, when writing SQL, especially when I am after a complicated dataset/recordset, that I have to get into the right mind set.

Anyway I was faced with a problem where an existing database table had duplicate records. I wanted to delete the duplicates but keep the most recent record.

[More]

Recurse through folder structure to a specific depth

Problem: A user at work wanted to copy an existing folder structure 3 levels deep and exclude all files. It would have been time consuming to create the folders manually and we couldn't copy 'n paste the folder structure because there was about 120GB of data and about 63,000 folders!

[More]

More Entries