[ UPDATE: Our NSRL lookup service is no longer operational ]
I have updated the Kyrus NSRLookup server to use the current version of the NSRL hashes, version 2.35. (Until now we were using a hash set which was slightly out of date and didn't have as many hashes.) You should notice more files matching now, especially those related to Windows 7. On a related note, *nix users should update their copy of nsrllookup. There was a bug found in version 1.1. The fix is version 1.1-2 and available at http://nsrlquery.sourceforge.net/.
To test the new hash set, I created a Windows 7 virtual machine and hashed all of the files on it. I then submitted all of those files to the Kyrus NSRLookup server, with the following results: $ nsrllookup -K known.txt -U unknown.txt -s nsrl.kyr.us < all.txt && wc -l *own.txt 40650 known.txt 7567 unknown.txt 48217 total Of 48,000 input files, 7,600 were unknown. That’s a lot of unknowns! So I made a custom version of md5deep (which will be published soon), which only hashes Windows executables. I then hashed just the Windows executables on my VM and submitted them to the server: $ nsrllookup -K known.txt -U unknown.txt -s nsrl.kyr.us < exe.txt && wc -l *own.txt 15937 known.txt 291 unknown.txt 16228 total That's a LOT fewer files overall! We went from a total of 48,000 files to 16,000. That's a dramatic reduction thanks to ignoring non-executable files. Why just executables? Depending on the case type, you may only be looking for executables. In eDiscovery or illicit imagery cases, where the focus is on documents, you are probably better off searching for those file types directly rather than attempting to eliminate everything else. When doing executable analysis, look for executables!
But the real payoff from this process is that there are only 291 unknown executables! That's actually manageable. Comparing 291 executables against the Malware Hash List is entirely do-able, as is sending the hashes to a service like VirusTotal. (Truth be told, I looked at the filenames of those 291 files, and most of them were part of VMWare.)
If you're champing at the bit to try out searching for executables, you can use another tool I wrote, Miss Identify, http://missidentify.sf.net/, to try it now. As a bonus, that program can generate warning messages when it finds executables which don't have an executable extension.
Finally, like you I have heard the complaints about the NSRL, such as http://ballinyourcourt.wordpress.com/2011/08/31/de-nisting-defective/. I asked the NSRL folks for a comment on the matter and they told me the following:
In December 2011 NIST identified a type of container file they were not recursing into when generating the hash sets. Failure to process these container files led to many hashes being omitted from the data set. The latest hash set, produced in January 2012, contains many more files, especially for those files in Windows 7. The next hash set, version 2.36, scheduled to be released in March 2012, will have even more.
They added, "NSRL appreciates ALL feedback from the community. We endeavor to respond in a timely manner, and we encourage you to contact us directly at email@example.com to enable NSRL to turn a solution around within a publication cycle."