Tuesday, 15 November 2011

The comm

The comm command in the Unix ancestors of computer operating systems is a account that is acclimated to analyze two files for accepted and audible lines. comm is defined in the POSIX standard. It has been broadly accessible on Unix-like operating systems back the mid to backward 1980s.

Usage

comm reads two files as input, admired as curve of text. comm outputs one file, which contains three columns. The aboriginal two columns accommodate curve different to the aboriginal and additional file, respectively. The aftermost cavalcade contains curve accepted to both. This functionally is agnate to diff.

Columns are about acclaimed with the character. If the ascribe files accommodate curve alpha with the separator character, the achievement columns can become ambiguous.

For efficiency, accepted implementations of comm apprehend both ascribe files to be sequenced in the aforementioned band accumulating order, sorted lexically. The array (Unix) command can be acclimated for this purpose.
The comm algorithm makes use of the allegory arrangement of the accepted locale. If the curve in the files are not both aggregate in accordance with the accepted locale, the aftereffect is undefined.

Return code

Unlike diff, the acknowledgment cipher from comm has no analytic acceptation apropos the accord of the two files. A acknowledgment cipher of 0 indicates success, a acknowledgment cipher >0 indicates an absurdity occurred during processing.

Example

File foo

apple

banana

eggplant

File bar

apple

banana

banana

zucchini

comm foo bar

apple

banana

banana

eggplant

zucchini

This shows that both files accept one banana, but alone bar has a additional banana.

In added detail, the achievement book has the actualization that follows. Note that the cavalcade is interpreted by the cardinal of arch tab characters. \t represents a tab appearance and \n represents a newline (C accent notation). The spaces apparent are not allotment of the achievement file.

\t \t a p p l e \n

\t \t b a n a n a \n

\t b a n a n a \n

e g g p l a n t \n

\t z u c c h i n i \n

Comparison to diff

In accepted terms, animosity is a added able account than comm. The simpler comm is best ill-fitted for use in scripts.

The primary acumen amid comm and animosity is that comm discards advice about the adjustment of the curve above-mentioned to sorting.

A accessory aberration amid comm and animosity is that comm will not try to announce that a band has "changed" amid the two files; curve are either apparent in the "from book #1", "from book #2", or "in both" columns. This can be advantageous if one wishes two curve to be advised altered alike if they alone accept attenuate differences.

Other options

comm has command-line options to abolish any of the three columns. This is advantageous for scripting.

There is additionally an advantage to apprehend one book (but not both) from accepted input.

Limits

Up to a abounding band charge be buffered from anniversary ascribe book during band comparison, afore the abutting achievement band is written.

Some implementations apprehend curve with the action readlinebuffer() which does not appoint any band breadth banned if arrangement anamnesis suffices.

Other implementations apprehend curve with the action fgets(). This action requires a anchored buffer. For these implementations, the absorber is generally sized according to the POSIX macro LINE_MAX.