News Center|2025/11/08 Immersive/object-based audio recording techniques 2 The cardioid-based surround array The five cardioid (directional) mic array has the advantage of higher channel separation compared to the omni-based array. To provide the correct coverage in the spaced array, the microphones can be placed closer to each other, creating a smaller array. Of course, this can be taken to the extreme by arranging the microphones in a coincident configuration. Example: A cardioid-based, 5-channel setup, providing equal coverage of all segments on the circle. The Wide Cardioid Surround Array The Wide Cardioid Surround Array (WCSA), introduced by Mikkel Nymand, provides equal timbral qualities, a high degree of envelopment and good low-frequency properties. To obtain the desired sound character (and to enhance the listening position from a sweet spot to a sweet area), the five signals should be decorrelated. This means the microphones must be placed at an adequate distance from each other. On the other hand, the signals should not be too different (distant) from each other. If this happens, the resulting sound will not be coherent. Omnidirectional microphones are often preferred for spaced arrays. This is due to their natural sound color and their ability to blend direct signals with room timbre. Wide cardioids (also named sub-cardioids) have a slightly more directional quality, which gives more ambience control and improved front imaging and localization accuracy. The surround array initiated by Geoff Martin and Jason Corey, uses an omnidirectional and a cardioid to create wide cardioid characteristics. By focusing on preventing inter-channel interference, the microphone pairs were spaced L-C 60 cm (24 in), R-C 60 cm (24 in), Front-Rear 60 cm (24 in) and LS-RS 30 cm (12 in). The rear mics used were upward-aiming cardioids to capture height information. DPA Microphones adapted this array to use five identical wide cardioid microphones (matched within a very narrow tolerance of ±1 dB on frequency response and sensitivity). Choosing five identical microphones instead of just a specific microphone type keeps the blend natural and leads to a more authentic and uniform reproduction of all channels. After intense listening sessions and numerous practical tryouts in different recording applications (symphonic music, modern jazz, PA/Live, pop concerts and ambience recording), it has been found that this adaptation tends to work best with a larger spacing, especially of the rear channels. This array creates an intense, dynamic and enveloping sound character. The recommended distances are: L-C 60-75 cm (24-30 in) R-C 60-75 cm (24-30 in) C-LR 20 cm (8 in) Front-Rear: 150-200 cm (59-79 in) LS-RS: 120-150 cm (47-59 in) Angling L/R: ±15° Angling LS/RS: ±165° For wide ensembles (or large array-to-source distances), try expanding this array with two left/right omnidirectional outriggers to benefit from the pressure transducers' low-frequency pickup. These microphones are blended with L/R from the array at an appropriate level, offering a beautifully coherent, precise and rich surround sound image. Soundfield / Ambisonics In the early 70s, the British engineers Peter Felget and Michael Gerzon invented the soundfield principle later known as Ambisonics (today known as "First Order Ambisonics"). The format is based on a coincident array of microphones. The aim is to facilitate arbitrary microphone orientation in any direction, left/right, front/back, up/down. Basically, the soundfield principle works like MS, by addition and subtraction of the available signals. Two configurations are associated with Ambisonics: A-format and B-format. The A-format is the physical arrangement of four cardioid microphone capsules and their output: FU (front upper), RU (rear upper), LD (left lower) and RD (right lower). The angles between the capsules are congruent with a tetrahedron, a triangular pyramid. The B-format is a converted version of the A-format, resulting in a virtual format consisting of three orthogonally-oriented figure-of-eight "capsules"; X (front/back), Y (side), Z (up/down) and one omni (W). By addition and subtraction, the individual signals can be converted to a directional microphone pointing in any direction. For instance, one omni (W) and one figure-of-eight (X) creates a cardioid pointing in the X-direction. DPA Microphones formerly produced microphones for the format but does not at present. Example: B-format components Optimized Cardioid Triangle (OCT) OCT is an array designed for the three front channels only. The system offers high separation between left-center and right-center. An additional configuration for the surround channels should be chosen carefully. A cardioid microphone is used for the center channel placed only 8 cm (3.1 in) in front of two higher-order directional cardioids for left and right channels, pointing outwards. The spacing between the left and right microphones is the key to the desired recording angle. Distances between 40 cm (15.7 in) and 90 cm (35.4) are recommended from the designers, resulting in recording angles from 160° to 90°. One or more pressure (omnidirectional) microphones can be added to the system to compensate for the missing low frequency from the pressure gradient capsules of the cardioids. Example: The OCT2 variation suggests that the center microphone should be placed 40 cm (15.7 in) in front of the left/right microphone base line, giving larger time differences and spaciousness more like the Decca Tree. Double MS A time coincident, compact and adjustable surround configuration. The Double MS setup is a time coincident, compact and adjustable configuration for surround sound/immersive sound. Two cardioids microphones and one figure-of-eight microphone are used. Alternatively, the setup can be created from four cardioid microphones. The principle of the Double MS technique is a forward and backward pointing MS set, sharing the same side microphone. As in a standard MS setup, the side microphone is positioned with the in-phase side pointing left so only three microphones are needed. In this setup, processing/mixing is necessary to create the final format. As always with MS setups, two different transducer types are applied to provide the mid-information (cardioid microphones) and the side-information (bi-directional microphones). There is the risk of different frequency and phase responses of sound reproduction from the sides or the front. This is how the channels are obtained: Center = Mfront Left = Mfront + S Right = Mfront – S Left surround = Mrear + S Right surround = Mrear – S The amount of each signal is adjusted for correct spatial distribution, especially regarding the frontal image. Typically, the L/R width is produced a little wider compared to standard MS for two-channel stereo. The Double MS technique can be attained by using four identical – evenly matched – 4011A or 4011C Cardioid Microphones angled on the horizontal plane at 0°, 90°, 180° and 270° respectively. The membranes should be arranged above each other for best time alignment in the horizontal plane. Mfront = Cardioid front S = S’ (Cardioid left) – S’’ (Cardioid right) *) Mrear = Cardioid rear * ) In practical record ing using a mixer, just pan "cardioid left" to the left and pan "cardioid right" to the right + invert the phase (swap pin 2 and 3). The "dirty" way to do this is by using a Y-summing cable and invert the XLR-connector for the cardioid right. Fukada Tree The Fukada Tree is a Decca Tree array, but with five cardioid microphones and two additional omnidirectional microphones as outriggers to blend in between the front and rear channels. This setup was designed by Akira Fukada in 1997. The choice of cardioid microphones improves the channel separation, and the backward-oriented rear cardioids also minimize leakage of direct frontal sound to the rear speakers. Omnidirectional microphones are often preferred in Decca Tree configurations for music recordings due to their natural sound color and full frequency bandwidth. The two omni outriggers serve this very important component in the Fukada Tree array. Since first announcing the Fukada Tree arrangement, Akira Fukada has designed a number of positioning modifications to improve front localization, but his choice of microphones remains constant and he continues to use DPA mics for their transparent feel. Hamasaki Square The Hamasaki Square consists of four bi-directional microphones arranged in a square. The Hamasaki Square is designed for capturing the ambient/diffuse part of a surround sound recording. It is a four-mic square with 1.8-2 m (5.9-6.6 ft.) between the figure-of-eight microphones, which are routed to left, right, left surround and right surround at an appropriate level compared to the front array. The figure-of-eight microphones are pointed with their in-phase sensitive directions against the sides and with their nulls to the direct sound. Compared to other systems for ambiance recording, this system is the least sensitive regarding the distance between the main array and the ambiance array. The setup is defined by the Japanese sound engineer Kimio Hamasaki. Immersive audio with height Setups developed for traditional surround recordings (like 5.1) have proven to work very well. However, adding height to these recordings is interesting as it may also add new dimensions to the perceived experience. https://v.qq.com/x/page/y3178qyqnn8.html The challenge is, however, how to add upward-directed sound images, without changing the perceived localization of horizontally positioned sound sources, meaning minimizing vertical inter-channel crosstalk. This leads to considerations regarding vertical time and level differences. The spacing of vertical microphones needed for decorrelation must also be considered. Finally, how can we avoid comb filtering in the unavoidable downmix? When height information is added in the right way, the perceived envelopment created by the sound is enhanced. More than that, good practice has demonstrated enhancement of the perceived precision when localizing the sound sources, even in the horizontal plane! Examples: A standard reproduction setup for immersive audio containing height information is 9.1, which is a standard 5.1 ITU 775 layout with additional upper-layer speakers above the left, right, left surround and right surround speakers. The height of the additional four speakers should provide a vertical listening angle of approximately 30°. Dr. Hyunkook Lee of Huddersfield University (UK) and his research group have provided a lot of theoretical and practical information on the perceived sound imaging. One important factor he found is that the precedence effect (the effect that the first arriving sound determines the direction) does not work in the vertical plane. Hence, it is worth looking at level differences. When playing back the same sound in the lower and the upper loudspeaker, it was found that the presence of higher frequencies and transient signals pulls the localization towards the upper loudspeaker [2,3]. Example: To keep the localization in the horizontal plane, it was found the upper signal should be attenuated by at least 7 These findings have led to the microphone setup shown below. It consists of eight cardioid microphones and two supercardioid microphones. The orientation of the microphones is such that there is a minimum of frontal sound entering the upper layer of microphones. In general, any upper-layer microphone should receive as little sound as possible that contains sound from the primary horizontal sources and sources below the horizontal plane. [1] Wallis, Rory, and Lee, Hyunkook: The Effect of Inter-channel Time Difference on Localization in Vertical Stereophony. Journal of the Audio Engineering Society, Vol. 63, No. 10, October 2015. [2] Lee, Hyunkook, and Gribben, Christopher: Effect of Vertical Microphone Layer Spacing for a 3D Microphone Array. Journal of the Audio Engineering Society, Vol. 62, No. 12, December 2014. [3] Lee, Hyunkook: Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array. AES Convention 138, Warzawa 2015. [4] Lee, Hyunkook: The Relationship between Interchannel Time and Level Differences in Vertical Sound Localisation and Masking. AES Convention 131, New York 2011. Share: