Sunday, October 10, 2010

File handling trick

Here is a file handling trick I had to find for myself. It works in OS X, I haven't tried on Windows.

Say you have folders A and B. All the files in folder A are also in folder B, but not vice versa. And you need to isolate the files in folder B which are *not* in folder A. How to do it? Manually is tedious and error-prone.

1: Make sure you have backups of both folders.
2: Select all the files in folder A, and drag them to folder B.
3: When Finder asks you if you want to replace the files, select "apply to all" and click "replace".
4: Use Edit--->undo.

This last bit is the smart part. It will put the files back in folder A, but it will *not* replace the deleted files in folder B! (I think we can call that an error in the OS.) So you now have a folder of what you wanted, all the files which were in folder B but not in A.


Philocalist said...

A bit of a hitch with Windows: when you do the first movement, it leaves behind a copy of each folder you have moved (though they are empty)
When you subsequently then 'undo', it DOES, but again leaves behind empty copies of the folders (in B) so waht you have in B is the same number of folders as originally, just some of them are empty ... and Windows has no in-built way to display folder sizes, to allow you to delete the empty ones (though after much hunting about, you can find a utulity that will allow this).

Just a thought ... are you certain that it really does work as you suggest? What happens if there are folders named the same in A & B, but they do not contain the same files, or numbers of files ... or the files ARE named the same but are of different sizes, dimensions, or even damaged?
Bit of a minefield huh? ... but I'm sat with a wry smile, as I think I know what you're up to, and I've been there before :-)
There is an excellent utility to do this in Windows, specifically for image files.
Basically its a dupe finder that will work within a folder, or comparing two folders, and can be set to work recursively.
Files can be deleted (or simply moved) according to criteria you set, so that in your example it would scan A and B, then delete anything found in B that was also in A, giving you what you want.
It can also take into account colour differences at a tolerance you can set, and also image size likewise ... but it don't work on a Mac.
I wouldn't say it's 100% accurate (what is?), but it comes damn close: images can be VERY similar, but still slightly different, no? :-)
Its freeware, runs on Vista (though was built in the XP era, dunno about '7' ... and I've NEVER found an alternative, despite MUCH searching ... I depend on this so much that it may even stop me moving to '7' if I found it incompatible (though I suspect / hope it will work in a compatibility mode within '7') ... and it's already virtually impossible to find / download!

colobr said...

"(I think we can call that an error in the OS.)

You don't seem to understand what you are doing--the commands you recommend OVERWRITE existing files. That means they have been destroyed. OS X is a wonderful operating system but it has no easily accessible resurrection from the dead commands.

What you call OS error I prefer to call operator incompetence by blindly overwriting files without any oversight or control--if your file management and housekeeping is so bad that you have duplicate files (other than managed backups) scattered around your hard disks then a high risk maneuver like the one you describe is certainly not going to improve such circumstances.

My suggestion is that before you peddle to others dangerous advice like this that you get some proper training for yourself. Learn how to use aliases and symbolic links, and get yourself some reliable software tools to perform duplicate identification and management. Above all, learn how to manage files and folders proficiently before you get in a mess.

Harsh and judgmental? Not at all, just telling it how it is. If your post was just a record of finding for yourself commands that are taught at novice level (albeit with proper warnings and explanations) then who am I to piss on your fireworks?

ttl said...

Is this a contest on how cumbersome and unreliable you can make it? ;-)

What's wrong with the good old:

find a b | sed 's/.*\///' | sort | uniq

1. Does not write anything to disk
2. Works on any number of files (10 or 10 million)
3. Scriptable for repeated use, if needed

Of course, we've only been doing it this way for about half a century; maybe you want to wait for the idea to mature a little before fully trusting it.

karrde said...

I was gonna mutter something about a cmd-line option in Linux...(the format would port to OS-X, if I wrote it right...)

And ttl beat me to it.

The method you propose depends heavily on things that the OS assumes when you do COPY, PASTE, and UNDO actions. This method can be more precise, but you have to learn to think like a programmer to do it.

Ray said...

For Windows, there's also a free Microsoft program called 'SyncToy'

Here's the link:

Eolake Stobblehouse said...

Yes, I don't know how I could have forgotten about
find a b | sed 's/.*\///' | sort | uniq

it works perfectly in Mac. I have not used it on nested folders.

My files are fine, thank you. Sometimes I've used this to find pictures I'd not yet used. And yesterday I used to to find images I had sorted from a folder, because (first time ever) GraphicConverter had deleted them instead of moved them, I couldn't find them anywhere. So I restored the original folder from backup, and subtracted the files I had *not* sorted out, and then didn't have to do the work twice.

BTW, I still think that an OS, when overwriting a file, should move the deleted file to trash or somewhere it could be recovered from. There really should not be actions you simple cannot undo in any way.

Eolake Stobblehouse said...

I struggle for real with remembering my own phone number, I don't think there is much chance that I could learn commands like that.

ttl said...

I struggle for real with remembering my own phone number,

So do I.

I don't think there is much chance that I could learn commands like that.

I do remember your aversion to the Unix shell. I was just poking fun of not you specifically, but everyone who argues that WIMP is an easy user interface and doesn't realize that they have painted themselves into a corner where they can only perform a handful of pre-granted operations on their computer, and everything else is either impossible or terribly cumbersome, as you have just proven.

I didn't know whether to laugh or cry when I saw a (popular, it seems) Mac product being advertised whose sole function was to rename files en masse. That's what you get when you grow a whole generation of computer illiterate people.

By the way, language constructs like the above (called pipelines) are not idioms that we have to memorize. Rather, we put them together on the fly by remembering 20 or so primitives by name. The details of each primitive are quickly available behind a few keystrokes.

Eolake Stobblehouse said...

Yeah, I dunno. Never touched it.