Calculating distance between word/document vectors from a nested dictionary

The first bit is easy enough. You want to build up a dictionary containing file numbers, and the sum of the squares of the values for each file number, something like this (untested) should do it:

fileVectors = {}

for wordDict in myDict.itervalues():
    for fileNumber, wordCount in
        fileVectors[fileNumber] =
fileVectors.get(fileNumber, 0) + (wordCount ** 2)

