# PLAS: A Compact, Self-Triggered, Dead time-less, High Channel Count Analog Memory ASIC for TRACE

R. J. Aliaga, V. Herrero-Bosch, S. Capra, J. A. Dueñas, A. Pullia, A. Gadea, and D. Mengoni

Abstract—The readout system for the new TRACE detector requires monitorization and sampling of all pulses in a large number of channels with very strict space and power consumption restrictions for the front-end electronics and cabling. A triggerless solution has been chosen involving front-end analog memories that sample a 1 µs window of the waveform of any valid pulses at 200 MHz while performing zero suppression and serialization tasks. An analog memory ASIC named PLAS (for PipeLined Asymmetric SCA) is presented that allows pulse capture with no dead time in any channel while reducing die area requirements for high input channel counts. The circuit is based on a novel architecture where the typical Switched Capacitor Array (SCA) structure is partitioned into two asymmetric stages and FIFO queue-like control circuitry is introduced for captured data. The ASIC has been designed in 0.18 µm CMOS technology with 32 independent, self-triggered input channels and a size of  $3.5 \times 3.9 \,\mathrm{mm^2}$ . Simulations predict figures around 100 MHz bandwidth, 12 ENOB and 10 mW per channel.

*Index Terms*—Analog memory, ASIC, dead time, detector readout, front-end electronics, Switched Capacitor Array (SCA), triggerless data acquisition, waveform sampling.

## I. INTRODUCTION

The TRacking Array for light Charged particle Ejectiles (TRACE) [1], [2] is a new telescope detector system for the discrimination of particles and light ions in fusion evaporation and direct nuclear reactions, designed to work in combination with a large gamma tracking array like AGATA [3]. Each detector cell is a  $\Delta$ E-E telescope consisting of a double silicon layer with respective thicknesses of 200 µm and 1 mm, forming a 12 × 5 array of pads with a 4 × 4 mm<sup>2</sup> pitch. Identification of different ions and particles relies on both  $\Delta$ E-E discrimination and pulse shape analysis (PSA) [4] based on the sampling of all detector pulses generated at the silicon pads. The first experimental tests have already taken place with temporary readout electronics using commercial DAQ modules and only a limited number of detector channels [5].

R. J. Aliaga and A. Gadea are with the Instituto de Física Corpuscular (CSIC-UV), C/ Catedrático José Beltrán 2, 46980 Paterna, Spain. (e-mail: raalva@ific.uv.es)

V. Herrero-Bosch is with the Instituto de Instrumentación para Imagen Molecular (I3M), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain.

S. Capra and A. Pullia are with the Istituto Nazionale di Fisica Nucleare - Sez. di Milano, Via Celoria 16, 20133 Milano, Italy.

J. A. Dueñas is with the Departamento de Física Aplicada, FCCEE Universidad de Huelva, 21071 Huelva, Spain.

D. Mengoni is with the Istituto Nazionale di Fisica Nucleare - Sez. di Padova, Via Marzolo 8, 35131 Padova, Italy. He is also with the Dipartimento di Fisica e Astronomia, Università di Padova, Via Marzolo 8, 35131 Padova, Italy.

It has been established that acquisition windows of  $1 \mu s$  at 200 MHz sampling frequency or higher are required for PSA [6] as imposed by the fast signals induced by light particles [7].

The final readout system for TRACE requires monitorization of all detector channels and sampling all generated pulses, which may appear in any channel; specifically, one front channel per silicon pad with hole signals is required for particle discrimination, plus one back channel per layer with electron signals that will be used mainly for spectroscopy. An event rate in the range of tens of kHz is expected, and an energy resolution below 1% at 5 MeV is to be obtained at room temperature. At the same time, the small dimensions of the reaction chamber impose strong restrictions on frontend circuit size, power dissipation, and connectivity. The complete front-end circuitry for each detector needs to fit in a  $25 \times 50 \text{ mm}^2$  circuit board, with a similar size as the detector itself, a total of 122 channels (120 front and 2 back) need to be read out per detector and captured information should be transmitted serially in order to keep cabling to a minimum, with an estimated maximum power budget around 20 mW per channel.

A triggerless readout scheme has been proposed in [8] where sampling, zero-suppression and serialization are performed at the front-end; it is outlined in Fig. 1. A custom Switched Capacitor Array (SCA)-based analog memory ASIC named PLAS (for PipeLined Asymmetric SCA) has been designed to perform these tasks. The circuit receives detector signals processed by tunable charge preamplifiers [9], detects valid pulses on any channel, samples them using both edges of a 100 MHz clock and stores the results together with their associated timestamps, which are later read out through a single analog output at 50 MHz and digitized remotely. The ASIC is based on a novel analog memory architecture with the specific purpose of minimizing detector dead time and area occupation per channel. The final version is intended to host 64 or 128 input channels; a first prototype with 32 input channels is described in this text instead.

## **II. PLAS ARCHITECTURE**

The principle of operation of the analog memory ASIC is based on sequential stages of SCAs. Because they are the basic building blocks of the PLAS architecture, SCA channels are described first. These circuits are used for the fast analog sampling of transient signals that are stored as charge in



Fig. 1. Outline of the TRACE readout scheme and the location of PLAS with respect to other system components.



Fig. 2. Selected implementation of a single SCA channel using a common operational amplifier. Odd and even capacitor switches are controlled by alternating clock edges for double sampling rate, with two capacitors being active at any given time.

internal capacitors and can be later read out at a slower pace, and have been employed as low-power substitutes for flash ADCs for the past 25 years. Some of the most representative examples are described in [10]–[13].

Several circuit topologies are possible for SCA channels; Fig. 2 shows the one used in PLAS featuring a common operational amplifier to assist in writing and reading operations. The input signal is connected through a switch w to a common bus from which a number L of storage cells hang, each containing a capacitor and a pair of control switches. With switches w and f closed and r open, cell  $C_i$  is written by closing switches  $a_i$  and  $b_i$ , so that the bottom plate of  $C_i$  is biased at a reference voltage  $V_{ref}$  while its top plate tracks the input signal. When switches  $a_i$  and  $b_i$  are opened, the capacitor holds the analog signal value at opening time. Switches in consecutive cells are closed and opened sequentially with a write frequency  $f_{\rm w}$  following a circular buffer logic; very high write frequency is possible if capacitor charge is allowed to overlap for longer tracking intervals. At any given time, the array cell capacitors are holding successive samples of the input signal corresponding to a time window of length  $L/f_{\rm w}$ .

When a trigger condition is met for the channel, the continuous writing operation is stopped by opening all switches. A readout process may then begin by closing switch r and then closing switch pairs  $a_i$  and  $b_i$  sequentially in order to dump the stored voltage values into the bus and to the output with a slow read frequency  $f_r$ . These values are then sampled using an external ADC. This SCA structure is replicated for each input channel. A problem arises in that the SCA cannot be rewritten until read out, or else the capacitor contents will be lost. In most analog memory designs, this induces a dead time in the channel which may be very long compared to the pulse acquisition window, since  $f_r$  is typically much lower than  $f_w$ . Existing solutions usually involve either partial readout [12], [14] or inefficient replication of resources [15], [16]. A different solution is proposed here that relies splitting the SCA asymmetrically into two sequential SCA stages connected through a full-mesh switching matrix: a first stage consisting of many short SCA channels, one for each input channel, and a second stage that contains a few long SCA channels that are shared between all inputs.

The general scheme is shown in Fig. 3, where the circuit has been divided into a first stage with  $N_1$  SCA channels of length  $L_1$  intended for pre-trigger samples and a second stage with  $N_2$  slots, each containing a  $L_2$ -cell SCA channel for post-trigger samples and an auxiliary  $L_1$ -cell SCA channel intended as a storage buffer for the samples in the first stage. Here,  $N_1$  is the total number of input channels,  $N_2 < N_1$ ,  $L_2 > L_1$ , and  $L_1 + L_2$  is the total capture window length for one event.

Initially, the second SCA stage is idle, and each channel in the first stage is continuously sampling the associated input signal, so that it contains the last  $L_1$  samples at any given moment. Whenever an input channel is triggered, the corresponding channel *i* in the first stage is write-locked and its samples are held; a free slot j in the second stage is then assigned and both are connected together through the switching matrix. At this moment, the input signal is connected to the write bus of the  $L_2$ -cell SCA, so pulse capture continues there. At the same time, the contents of channel iare sequentially read and copied to the  $L_1$ -cell buffer in slot j, at such a pace that the data transfer is complete before capture of the  $L_2$  post-trigger samples ends. At this point, the input channel is immediately ready to start sampling again; therefore, no dead time is introduced. The whole captured pulse is stored in second stage slot j, which remains locked until it is read out sequentially.

## **III. ASIC DESCRIPTION**

The block diagram of the ASIC prototype is outlined in Fig. 4, with simplified depictions of the input stage and SCA channels. The circuit is based around a Pipelined Asymmetric SCA with  $N_1 = 32$  independent input channels and  $N_2 = 8$ output slots, with capacity for  $L_1 = 32$  pre-trigger and  $L_2 =$ 192 post-trigger samples for each captured pulse, respectively. A sampling frequency of  $f_{\rm w} = 200 \,\rm MHz$  is achieved by using both edges of an input 100 MHz clock to drive even and odd write switches in each channel, respectively. The total pulse capture window is thus  $(L_1 + L_2)/f_w = 1.12 \,\mu s.$ Storage capacitors of 270 fF have been chosen as a tradeoff between noise specifications, slew rate requirements and timing performance. Besides the SCA channels for sample storage, each slot in the output queue contains additional digital registers for the pulse timestamp, which is immediately latched on trigger, and for identification of the corresponding



Fig. 3. Pipelined SCA architecture with separate channels for pre-trigger (left) and post-trigger (right) memory connected through a switching matrix. An additional channel in the right section acts as a storage buffer for the pre-trigger memory.



Fig. 4. Simplified block diagram with the main ASIC components and interface signals.

input SCA channel and sampling cell position at the time of trigger; these two values are transferred serially from the pre-trigger channel after the captured samples.

#### A. Input stages

A schematic of the input stage for each ASIC channel is depicted in Fig. 5. It consists in an inverting amplifier with gain  $-R_2/R_1$  that adapts the signal range between the preamplifier output and the SCA;  $R_2$  is internal to the ASIC and fixed but  $R_1$  is external and can be used to adjust the gain. The inclusion of at least one external component is required for isolation, since preamplifier pulses exhibit a dynamic range of 2.6 V whereas the internal working range spans between 0.3 V and 1.5 V. Two global test inputs are included for calibration purposes, that need their own external resistors.

The amplifier input is biased at a programmable reference voltage that provides the adequate operating point depending on the input signal range. Two reference voltages are generated using internal DACs for every group of eight input channels, so that most combinations of input signal ranges and polarities are supported on the same device. In particular, the front and back channels of each TRACE detector cell, which provide pulses with opposite polarities, can be read out using the same ASIC.

Several trigger modes are available and can be configured individually for each channel. The standard trigger condition is leading edge discrimination, using a comparator with an individually programmable voltage threshold. Hysteresis is implemented by inhibiting further triggers until another comparator detects the pulse signal crossing a second, lower programmable threshold on its way down, in order to avoid false triggers due to noise on the falling edge. Other trigger conditions are provided by four global, external trigger signals, the sensitivity to which can be programmed independently for



Fig. 5. Diagram of the input stage for each analog memory channel.

each channel in order to implement additional functionality such as synchronization, calibration and triggering from the preamplifiers' fast-reset logic. In addition, a dedicated external trigger is included for four test channels. A combined Trigger Request output is generated for each trigger in order to inform the Global Trigger and Synchronization (GTS) subsystem [17] of event acceptance.

Each input channel contains several local configuration registers including reference voltage selection, sensitivity to triggers, leading edge trigger polarity, and DAC threshold values. In addition, the DACs that generate the reference voltages for each group of eight input channels may also be programmed. An  $I^2C$  interface is included in the ASIC for configuration of all of these internal registers.

## B. Readout circuitry

A dedicated interface is used for the readout of captured events. Readout is timed with the input clock at  $f_r = 50$  MHz. Event data are output as analog differential signals through an integrated differential amplifier and a line driver external to the ASIC. Data are digitized by an external ADC at the back-end, and include both analog waveform samples and the digital data encoded as analog values; they must be decoded by the receiving FPGA after digitization. During idle mode, i.e. when no pulse information being transmitted, an alternating sequence of zeros and ones is output continuously in order to allow the receiver to tune the ADC sampling point as close as possible to the next edge for improved accuracy.

A 4-bit header indicates the start of a new event frame with the format outlined in Fig. 6. 64 bits of digital data include the timestamp and identification of the queue slot, input channel and cell position where the trigger was issued; these data are enough to completely identify events and their full source and path through the ASIC for calibration correction. In particular, the pulse timestamps must be used to determine whether different pulses belong to the same event, because event reception latency is not deterministic due to the FIFO queue; in any case, they are used for complete event building including data from other detector arrays, e.g. AGATA. A 7-bit Hamming errorcorrecting code (ECC) is computed by the readout controller and included in the frame that allows correction of single bit errors and detection of double bit errors. Finally, the 224 captured samples are serially transmitted, with 7 wait cycles in between due to the internal organization of the SCAs in 32-cell sections. Complete transmission of a single event takes  $5.98 \,\mu s$ .

# **IV. DISCUSSION**

The main advantage of the PLAS structure is the area reduction, as the number of total memory cells is reduced from the  $32 \times 224 = 7168$  required in a full SCA structure to  $32 \times 32 + 8 \times 224 = 2816$ . This is accomplished by sharing storage resources among all input channels; the reduction factor depends on the dimensioning and is better for higher amounts of input channels. Another advantage is the lack of readout-related dead time for single channels, which is a novel feature to the best of the authors' knowledge: the ASIC only exhibits dead time when the output queue is completely full, and all input channels are simultaneously locked in that case. Both advantages exploit the fact that the intended application in the TRACE detector has relatively low event rates and multiplicities.

This architecture also presents some disadvantages compared to the use of full channels for every input. The main drawback comes in the form of different circuit response for pre- and post-trigger samples, as they are processed along separate paths. This increases the complexity of the calibration and correction procedures. Additionally, pre-trigger samples undergo an extra copy operation when being transferred from the first into the second stage, whereby additional noise is introduced. However, pre-trigger samples will be mainly used for estimation of constant voltage levels (either baseline or ToT output from back channels) so the impact of this SNR difference will be largely diminished. One final disadvantage is a slight loss of flexibility, in that the amount of pre-trigger samples is fixed and the maximum number of pulses stored simultaneously is lower.

# V. SUMMARY AND OUTLOOK

A novel analog memory architecture has been proposed wherein the typical SCA structure is split into two pipelined,



Fig. 6. Event frame format.



Fig. 7. Layout of the first PLAS prototype with 32 input channels. The first and second SCA stages correspond to the top and bottom halves, respectively.

asymmetric stages and captured data are stored in an analog FIFO queue, that dramatically reduces area requirements and removes readout-related dead time. A prototype ASIC design has also been described based around the aforementioned analog memory architecture, intended for the front-end readout of the TRACE detector, but generic enough that it can be used with other detectors or applications.

The ASIC has been designed on 0.18  $\mu$ m CMOS technology with 1.8 V power supply for reduced area and power consumption. The final layout is shown in Fig. 7, with a die size of  $3.5 \times 3.9 \text{ mm}^2$ . Only simulated performance is available at this point, suggesting an input signal bandwidth over 100 MHz and noise performance close to 12 ENOB in the worst case (i.e. pre-trigger samples) with power consumption slightly above 10 mW per channel. The circuit has been sent to foundry in November 2015 and samples for testing are expected before March 2016.

## REFERENCES

- Tracking Array for Light Charged Particle Ejectiles. [Online]. Available: https://web.infn.it/spes/index.php/research-on-nuclear-physics/150-trace
- [2] A. Gadea *et al.*, "Conceptual design and infrastructure for the installation of the first AGATA sub-array at LNL," *Nucl. Instr. Meth. A*, vol. 654, pp. 88–96, 2011.
- [3] S. Akkoyun et al., "AGATA Advanced GAmma Tracking Array," Nucl. Instr. and Meth. A, vol. 668, pp. 26–58, 2012.
- [4] J. A. Dueñas *et al.*, "Identication of light particles by means of pulse shape analysis with silicon detector at low energy," *Nucl. Instr. Meth. A*, vol. 676, pp. 70–73, 2012.
- [5] D. Mengoni *et al.*, "Digital pulse-shape analysis with a TRACE early silicon prototype," *Nucl. Instr. Meth. A*, vol. 764, pp. 241–246, 2014.
- [6] J. A. Dueñas *et al.*, "Interstrip effects influence on the particle identification of highly segmented silicon strip detector in a nuclear reaction scenario," *Nucl. Instr. and Meth. A*, vol. 743, pp. 44–50, 2014.
- [7] M. Assié *et al.*, "Characterization of light particles ( $Z \le 2$ ) discrimination performances by pulse shape analysis techniques with high-granularity silicon detector," *Eur. Phys. J. A*, vol. 51, pp. 1–11, Jan. 2015.
- [8] R. J. Aliaga *et al.*, "Conceptual design of the TRACE detector readout using a compact, dead time-less analog memory ASIC," *Nucl. Instr. Meth. A*, vol. 800, pp. 34–39, 2015.
- [9] S. Capra, D. Mengoni, and A. Pullia, "Experimental performance of the I2C integrated multichannel charge-sensitive preamplifier of TRACE," in *Proc. IEEE NSS/MIC*, San Diego, CA, Nov. 2015.
- [10] S. A. Kleinfelder, "A 4096 cell switched capacitor analog waveform storage integrated circuit," *IEEE Trans. Nucl. Sci.*, vol. 37, no. 3, pp. 1230–1236, Jun. 1990.
- [11] G. S. Varner *et al.*, "The large analog bandwidth recorder and digitizer with ordered readout (LABRADOR) ASIC," *Nucl. Instr. and Meth. A*, vol. 583, pp. 447–460, 2007.
- [12] P. Baron *et al.*, "AFTER, an ASIC for the readout of the large T2K time projection chambers," *IEEE Trans. Nucl. Sci.*, vol. 55, no. 3, pp. 1744–1752, Jun. 2008.
- [13] S. Ritt, R. Dinapoli, and U. Hartmann, "Application of the DRS chip for fast waveform digitizing," *Nucl. Instr. and Meth. A*, vol. 623, pp. 486–488, 2010.
- [14] C. L. Naumann *et al.*, "New electronics for the Cherenkov Telescope Array (NECTAr)," *Nucl. Instr. and Meth. A*, vol. 695, pp. 44–51, 2012.
- [15] F. Druillole *et al.*, "The analog ring sampler: an ASIC for the front-end electronics of the ANTARES neutrino telescope," *IEEE Trans. Nucl. Sci.*, vol. 49, no. 3, pp. 1122–1129, Jun. 2002.
- [16] S. Anvar *et al.*, "AGET, the GET front-end ASIC for the readout of the time projection chambers used in nuclear physics experiments," in *Proc. NSS-MIC Conf.*, Valencia, Spain, Oct. 2011, pp. 745–749.
- [17] M. Bellato *et al.*, "Sub-nanosecond clock synchronization and trigger management in the nuclear physics experiment AGATA," *J. Instrum.*, vol. 8, p. P07003, Jul. 2013.