The paper deals with the listener's memory of speaker individualities included in an utterance as a first step toward the realization of a voice montage system. A brief description is first given on the methodology to extract expressions in Japanese to represent voice qualities relevant to speaker individualities of an utterance, being followed by a display of the expressions derived. The degradation of listener's memory of speaker individualities was evaluated as a function of time using the expressions. Statistical analyses revealed that the within-listener variation of the memory was considerably small but the between-listener variation could not always be disregarded, depending on the speaker characteristics of an utterance: a distinctive utterance had the tendency of long preservation in the memory, but an "average" utterance was retained with less confidence. This utterance-dependent characteristic of the memory was also supported by the speaker identification experiments performed under different schemes.