Department of Physics and Astronomy Heidelberg University

Bachelor Thesis in Physics submitted by

# Jan Küpperbusch

born in Oldenburg (Germany)

2023

# Module-Wide Synchronization Studies on the Mu3e Tile Detector

This Bachelor Thesis has been carried out by Jan Küpperbusch at the Kirchhoff Institute for Physics in Heidelberg under the supervision of Prof. Hans-Christian Schultz-Coulon

# Abstract

The Mu3e Experiment is set to search for the lepton-flavour violating  $\mu \rightarrow eee$  decay at the Paul-Scherrer Institute, Switzerland. It will probe the validity of the decay in the Standard Model of particle physics and its possible extensions to a sensitivity of 1 in 10<sup>16</sup> particle decays, which will be achieved through a high rate muon beam line of 10<sup>8</sup> muon decays per second and an experiment with pixel, scintillating fibre and scintillating tile detectors. The tile detector is located upstream and downstream of the central detector and is subdivided into seven modules covering the total phi range with a high granularity of 5 mm × 5 mm. As its most important quality is to reach the system-wide best timing to minimize combinatorial background, the tile detector requires a stable synchronization between all readout ASIC's. This thesis sums up signal timing measurements on the newly commissioned TMB and discusses the development and evaluation of a module-specific stable synchronization method both under lab and beam conditions.

# Kurzfassung

Das Mu3e Experiment wird nach dem Leptonzahl-verletzenden  $\mu \rightarrow eee$  Zerfall am Paul-Scherrer-Institut in der Schweiz suchen. Mit diesem Experiment soll der Zerfall im Standardmodell der Teilchenphysik genauer untersucht und auf mögliche Erweiterungen mittels einer Genauigkeit von 1 in 10<sup>16</sup> Teilchenzerfällen getestet werden. Dies wird mittels eines Myonenstrahls mit 10<sup>8</sup> Teilchen pro Sekunde und einem Experiment, bestehend aus Pixelchips, szintillierenden Fasern und einem szintillierenden Kacheldetektor bewerkstelligt. Hierbei liegt der Kacheldetektor zu den Seiten des zentralen Detektors und deckt den gesamten  $\varphi$ -Bereich mit sieben Modulen und einer Granularität von 5 mm × 5 mm ab. Da seine wichtigste Eigenschaft das Erreichen der systemweit besten Zeitauflösung für die Minimierung von Hintergrundprozessen ist, benötigt der Kacheldetektor eine stabile Synchronisation zwischen allen Auslesechips. Diese Arbeit beschreibt Zeitmessungen auf der in Betrieb genommenen Leiterplatte für das Kachelmodul sowie die Entwicklung und Bewertung einer Methode zur stabilen Synchronisation des Kachelmoduls unter Laborund Strahlbedingungen.

## Acronyms

- **ASIC** Application-specific integrated circuit
- **CLK** Clock signal on the Tile Module Board
- DAB Detector adapter board, specifically designed for all three Mu3e subdetectors
- **DAQ** Data acquisition
- DCR Dark count rate, cumulative description for non-physical hits of silicon photomultipliers
- FEB Front-End Board, the first FPGA layer of Mu3e's data acquisition chain
- FPGA Field programmable gate array
- $\textbf{FSM} \ \ Finite \ state \ machine$
- $\textbf{G-APD} \ \ Geiger-avalanche \ photo \ diode$
- $\ensuremath{\mathsf{HV}}\xspace{-}$  MAPS High-voltage monolithic active pixel sensors
- $I^2C$  Command protocol for bus communication
- **IP** Intellectual property
- **INJ** Pulse injection line on the Tile Module Board
- **LFV** Lepton-flavour violation
- LVDS Low-voltage differential signaling, a signaling interface for high-frequent signals
- MIDAS Maximum integrated data acquisition system, the DAQ software package of Mu3e
- ${\sf MuPix}~{\rm Mu3e}~{\rm HV}{\text{-}{\rm MAPS}}$  180 nm technology pixel sensor
- ${\sf MuTRiG}~{\sf Muon}$  Timing Resolver including Gigabit-link, the tile detector readout chip
- **PLL** Phase-locked loop
- **PSI** Paul-Scherrer Institut, Villigen
- **RST** Reset signal line on the Tile Module Board
- $SciFi \ {\rm Scintillating \ fibres}$
- SciTile Scintillating tiles
- SiPM Silicon photomultiplier
- **SMA** SubMiniature version A, a coaxial connection interface
- **SSW** Service support wheel, support structure to both ends of the Mu3e detector, hosting electronics and FEBs
- SWB Switching board, the second FPGA layer of the Mu3e DAQ
- **TDC** Time-to-digital converter
- **TMB** Tile Module Board, the tile detector PCB hosting 13 MuTRiGs
- **VHDL** Very high speed integrated circuit hardware description language, a description language supported by all common FPGAs

# Contents

| 1  | Introduction                                                 |                                    | 6  |
|----|--------------------------------------------------------------|------------------------------------|----|
|    | 1.1                                                          | The Mu3e Experiment                | 6  |
|    | 1.2                                                          | The Mu3e Tile Detector             | 9  |
|    |                                                              | 1.2.1 Tile Sensor Matrix           | 9  |
|    |                                                              | 1.2.2 The MuTRiG chip              | 10 |
|    |                                                              | 1.2.3 Tile Module Board            | 11 |
|    | 1.3                                                          | The Data Acquisition (DAQ) system  | 12 |
| 2  | Mea                                                          | surements on the Tile Module Board | 15 |
|    | 2.1                                                          | Signal Propagation Timing          | 16 |
|    | 2.2                                                          | CLK/RST skew                       | 17 |
|    | 2.3                                                          | CLK/INJ skew                       | 19 |
| 3  | Clock/Reset Alignment Firmware                               |                                    | 21 |
|    | 3.1                                                          | Motivation                         | 21 |
|    | 3.2                                                          | Intel Quartus IP Cores             | 22 |
|    | 3.3                                                          | Finite State Machines              | 22 |
|    | 3.4                                                          | Architecture                       | 25 |
|    |                                                              | 3.4.1 Phase Shift Logic            | 25 |
|    |                                                              | 3.4.2 FSM Working Principle        | 26 |
| 4  | Reset Shift Measurements                                     |                                    | 28 |
|    | 4.1                                                          | Delay Chain Behaviour              | 28 |
|    | 4.2                                                          | Tile Module Synchronization        | 29 |
|    |                                                              | 4.2.1 Pairwise Chip Comparison     | 32 |
|    |                                                              | 4.2.2 Module-Wide Synchronization  | 33 |
| 5  | Sum                                                          | imary                              | 35 |
| Ap | Appendix A Hardware Photographs                              |                                    |    |
| Ap | Appendix B Timestamp Analyzer Diagram                        |                                    |    |
| Ap | Appendix C Complete synchronization analysis of MuTRiG pairs |                                    |    |

## **1** Introduction

This thesis will put its focus on the timing and synchronization of the Mu3e tile detector. In the following Chapter, the Mu3e experiment will be introduced briefly, with a more detailed look on the tile detector part and essential information about the *Data Acquisition* (DAQ).

Chapter two will give a short wrap up of timing measurements on the newly commissioned *Tile Module Board* (TMB), verifying that the timing requirements of the tile detector are met by the very same TMB. This chapter also provides useful information on the feasibility of the synchronization of the tile detector.

The third chapter switches back to a methodic approach, introducing digital logic components needed for the synchronization of the tile detector and sketching the designed entity in the DAQ's front-end.

The behaviour and performance test of the aforementioned entity is evaluated in the fourth chapter, which is the chapter where the final results are presented.

If not further specified, pictures and graphics were taken or compiled without external sources involved.

#### 1.1 The Mu3e Experiment

The Mu3e experiment [1] is a particle detection experiment designed and commissioned by several research groups in Great Britain, Switzerland and Germany. It will be conducted at the *Paul-Scherrer Institute* (PSI) in Villigen and is going to investigate on the *lepton-flavour violating* particle decay  $\mu^+ \rightarrow e^+e^-e^+$  (see also fig. 1).



Figure 1: The  $\nu$ SM predicted muon decay into three electrons

Allowing neutrino oscillation, this decay can be observed according to the standard model [2] with a *branching ratio* (BR) of

$$BR = \frac{p(\mu \to eee)}{p(\mu \to X)} \ll 10^{-50}$$
(1.1)

where certain other extensions of the standard model predict a branching ratio of serveral magnitudes higher (even up to  $10^{-14}$ ). The decay was previously examined by the SIN-

DRUM [3] collaboration, which was operated at PSI from 1983-86 and which determined the upper limit for the branching ratio to  $BR_{SINDRUM}(\mu \rightarrow eee) < 1.0 \cdot 10^{-12}$ , whereas the Mu3e collaboration aims for a further improvement to  $BR_{Mu3e}(\mu \rightarrow eee) \simeq 1.0 \cdot 10^{-16}$ .

The general subject of *lepton-flavour violation* (LFV) research is of big interest for ongoing research in particle physics, as its mechanism and parametrization is still unknown and most likely correlates with common disaccords of observation compared to the standard model like *neutrino oscillation* [4] or *CP-violation* [5], [6]. Other experiments on lepton-flavour-violating muon decays are the *MEG* [7] or the *Mu2e* experiment [8].

The current beamline<sup>1</sup> is producing Muons at a rate of  $10^8$  decays per second in a doublecone shaped mylar target, which stops incoming pions until they decay via  $\pi^+ \rightarrow \mu^+ + \nu_{\mu}$ . While for the first experimental phase, this rate is sufficient, an upgraded muon beam line<sup>2</sup> is planned to produce  $10^9$  muons/s with the same process, achieving the full desired sensitivity. The Mu3e decay must be distinguished from the background decay  $\mu \rightarrow eee\nu\nu$ (internal conversion), as well as from combinatorial background of the leading order Michel decays  $\mu \rightarrow e\nu\nu$  or Bhabha scattering. The first mentioned decay (BR  $\simeq 3.4 \cdot 10^{-5}$ ) requires a detector with a very good momentum resolution in order to detect the missing momentum carried by the neutrinos. Combinatorical background (i.e. two events with vertices extremely close together which look like the desired decay) must be resolved by an excellent vertex reconstruction resolution.

**Subdetectors** As sketched in fig. 2, in the *first Phase* of the experiment, the vertex detector consists of two central layers of pixel detectors, using the *HV-MAPS* architecture [9]. They are grouped into submodules, the so called *ladders*, which host six pixel chips for the inner two layers and 17/18 for the outer layers. Four or five ladders are mounted on a rigid module structure, which are then grouped into a full- $\varphi$ -covering pixel layer.



Figure 2: Mu3e Phase I schematic. The central detector hosts the inner vertex pixel detector, a scintillating fibre layer and two outer layers of pixel ladders. The upstream and downstream recurl stations contain another two pixel ladders and the scintillating Tile detector.

All chips have an active area of  $20.48 \,\mathrm{mm} \times 20.00 \,\mathrm{mm}$  with a pixel size of  $80 \,\mu\mathrm{m} \times 80 \,\mu\mathrm{m}$ ,

<sup>&</sup>lt;sup>1</sup>Compact Muon Beam Line (CMBL)

<sup>&</sup>lt;sup>2</sup>High-Intensity Muon Beamline (HiMB)

narrowing down the spatial resolution to  $30 \,\mu\text{m}$ . They are planned to achieve a thickness of  $\leq 50 \,\mu\text{m}$ , corresponding to an approximated fraction of  $X/X_0 = 0.115 \,\%$ , as well as a timing resolution of  $\leq 20 \,\text{ns}$ .

The Mu3e detector also comprises a layer of scintillating fibres (SciFis) with a combination of good spatial and timing resolution (100  $\mu$ m, < 1 ns). The 250  $\mu$ m thick and 300 mm long fibres are arranged in three layers, slightly set off with respect to each other in  $\varphi$ direction and spanned along the z-axis. The light created in the fibres is readout by Silicon Photomultipliers<sup>3</sup>, which are then placed at both fibre ends, thus forming two arrays with 128 channels. The decay products will pass these parts of the detector and recurl in a 1 T magnet field in order to reach the outermost part of the detector: The *Recurl Station*. The arrangement of a recurl station minimizes the effect of multiple scattering for geometrical reasons, as the recurl around roughly  $\pi$  cancels out the first-order multiple scattering angle and enables the detector to resolve the momentum much better with an additional pixel ladder on the outside of each recurl station. After the particles traversed the last pixel layer, they reach the *Tile detector*.



Figure 3: Tile detector station CAD rendering. Consisting of seven modules and hosting readout chips on the TMB, it is able to cover the total  $2\pi$  angle in the  $\varphi$ -plane. [10]

It is a detector with an excellent timing resolution, consisting of a layer of organic scintillating tiles of the size  $5 \text{ mm} \times 6.2 \text{ mm} \times 6.3 \text{ mm}$ , which are supplied by Eljen and cut by the Kirchhoff-Institut's internal workshop. They read out with a photon counting SiPM<sup>4</sup>, enabling the Tile detector to achieve a timing resolution of below 100 ps at a detection efficiency of close to 100 %. As the main part of this thesis will deal with the signal timing of this subdetector, especially the achievement of the time resolution will be explained in

<sup>&</sup>lt;sup>3</sup>Hamamatsu S13552-HRQ

 $<sup>^4\</sup>mathrm{Hamamatsu}$ S<br/>13360-2050 VE

more detail.

All connection cables of the detector parts will be guided to so called *Service Support Wheels*, sitting up- and downstream with respect to the detector in the magnet hull, centered to the beamline. They host central distribution parts like the main readout electronics, bias voltage generators or cooling distribution to the Tile and Pixel modules. In the later planned *Phase II*, it is planned to prolong the recurl stations to each side and minimize dead, non-detecting area.

## 1.2 The Mu3e Tile Detector

In the following part, the tile detector will be explained further, which will be needed in the parts about measurements and methods concerning the timing. The specifically measured quantity of this part of the detector will be the hit timestamp, which should be recorded with a resolution of well below 100 ps. The sensor elements of the Tile detector consists of *organic scintillator tiles*. The geometry is designed cylindrical with a coverage of 56 tiles in  $\varphi$  and 52 tiles in z-direction. They are organized into 7 submodules, each covering the whole z length and a  $\varphi$  of roughly 51° (fig. 3). As they do not overlap, the edge tiles are inclined by 25.7° and thus fit directly next to each other.

#### 1.2.1 Tile Sensor Matrix

The tiles of one module are organized in 26 4x4 tile matrices, which are connected to the underlying PCB, the Tile Module Board, by a flexprint bent around the long edge of the module. The matrices are arranged in a grid of  $2 \times 13$  (phi vs. Z direction). One matrix is build up of a quadratic PCB and 16 SiPMs with the scintillator glued upon it (4). If a particle hits a scintillator, through excitation/dexcitation processes photons are emitted. The number of photons is proportional to the energy loss of the incoming particle, enabling some scintillators to be a great material for calorimetric applications. The light yield in this case is about  $10^4$  photons per MeV with rise/decay time constants of (0.5 ns/1.4 ns).



Figure 4: Scintillator Matrix mounted on the TMB. As the board is flipped, the soldering points of the MuTRiGs are visible, along with clock distribution chips (marked by the udentifiers U94 and U90).

The fact that this excitation and thus signal creation is very fast especially compared to inorganic scintillators, the EJ-228 Polyvinultoluene organic scintillator is an appropriate material to achieve the desired timing condition. In addition to that, the intra-matrix timing fluctuation through different routing lengths of the signal paths, has been evaluated and can be corrected for in the software [11].

SiPM's The light produced is captured by a Hamamatsu Silicon Photomultiplier (SiPM) for each tile individually. These SiPMs consist of a matrix of 3584 photodiode pixels, each following the principle of a Geiger Mode Avalanche Photo diode (G-APD) [12]. Like in a classical photo diode, the light creates an electron hole pair in the space charge region. The difference now is that there is a high doted  $p^+n^+$ -area inside the normal pnjunction, leading to a very high electric field. Inside this amplification region, the drifting electrons are accelerated to energies at which they elevate other electrons to the conduction bands, thus forming an avalanche on the way to the readout electrode. Still, there is a distinguishment between a linear driven Avalance Photo Diode and one operated in geiger mode: A normal APD avalanche's signal electron yield is still linear to the primary ionization, whereas the geiger mode requires a bias voltage quite above the breakdown voltage. The formed avalanche generates a high discharge current, which does not depend on the energy of the primary ionization and thus requires the already mentioned array of single photo cells, in order to count the total number of incoming photons. The discharge current has to be stopped, quickly making way for the next avalanche process. The stopping of an avalanche is often done by a quenching resistor, which limits the bias voltage in case of a too high current and stops the development of the avalanche. The advantages lie in the sensibility for even single photons (which is very helpful for the fibre part), as well as in the excellent timing resolution achieved due to faster avalanche formation.

An avalanche is typically formed by an incoming photon from the scintillator, however also electrons excited to the conduction band via thermal excitation or quantum tunneling contribute to the signal, which are typically convoluted under the term *Dark Noise*. In order to quantify the behaviour of this dark noise, a *Dark Count Rate* (DCR) can be measured, which is depending exponentially on the temperature (thermal excitation) and the bias voltage (tunneling/recombination).

#### 1.2.2 The MuTRiG chip

The SiPMs convert the light collected into a charge pulse output, which is then distributed through the flexprint to its readout unit. The processing of this data is done by an *Application-specific Integrated Circuit* (ASIC), the *Muon timing Resolver including Gigabit-link* (MuTRiG) chip [13].

It consists of a parallelized comparator unit for reading out the energy and time information, using uniquely tunable thresholds for both information. The digital output of both comparators is then combined into one single digital signal, allowing to register time and energy information using the same *time-to-digital converter* (TDC). Along with the SiPM signal, the thus constructed digital signal can be seen in fig. 5.



Figure 5: E- and T-trigger of the MuTRiG. The digital outputs of both triggers are combined with an XOR-logic, encoding both signals into one digital output. [13]

In the SciTile part, this DCR can be filtered out as the light yield of the tiles is high enough to set the thresholds above the DCR of the SiPM, thus only triggering on physical tile hits. SciFi has a light yield of only a few photons, requiring the threshold to be set in the DCR level which significantly increases the rate.

One MuTRiG is connected to two 4x4 matrices and thus capable to read out 32 channels at a rate of 1 MHz each. As this will be used a lot during this work, it is important to point out that there is a distinguishment between the MuTRiG specific channel ranging from 0 to 31 and the global channel of a module (0 to 415) or even the whole system. The timing information is split into different granularities: A time frame header is sent by the MuTRiG every  $\sim 10 \,\mu$ s. Inside the time frame, the coarse counter (i.e. 1.6 ns timestamps) timestamp is decoded in a 16-bit vector, which is driven by a 625 MHz clock input. The finest time unit is the 50 ps fine counter, which is assigned via an additional 5-bit word in the MuTRiGs output protocol. Its effective 10 GHz (16 × 625 MHz) clock signal is internally created with a *Phase-locked Loop* (PLL). This PLL takes the 625 MHz input clock distributed from the TMB and outputs a clock according to a phaseshift created in an internal voltage controlled oscillator. The phases of the reference signal and the clock generated by the VCO are are compared to lock the internal counters to the reference. As all timing information depends on the correct synchronization of the clock, a chip with an unlocked PLL is useless.

#### 1.2.3 Tile Module Board

On the *Tile Module Board* (TMB), 13 MuTRiGs are reading out two tile matrices each and comprising hit frames reconstructed from the analog SiPM signal.

Three main cable types are connected to the TMB. Biasing the matrices with a high voltage of 56 V, four mini-coaxial connectors are plugged on four positions on the TMB, providing the power for the SiPMs in four quadrants. The low voltage cables carrying the 2.1 V and 3.6 V supply voltage for the MuTRiGs are attached next to the first chip, as well as a flat 40-pin micro coaxial cable.



Figure 6: **TMB with one connected sensor matrix.** To the right, the DC voltage inputs (in colors blue, red and yellow), as well as the micro-coax are visible. The matrices can be connected to both long sides of the board and are mounted on the other side, while the flexprint is bent around the corners. [14]

With this cable, all control protocol and data lines are provided, as well as the already mentioned clock and reset lines via an LVDS interface to prevent high-frequent noise interference. The TMB layout comprises one serial data line, a clock, reset and injection test signal line which are distributed to every chip with serially connected distribution chips. Along with that, control signals are send by the DAQ slowcontrol unit, which adresses parts of the board via  $I^2C$  protocol over an additional line. It is used for selecting chips as well as read-back of relevant monitoring data from the TMB, including power and temperature monitoring.

## 1.3 The Data Acquisition (DAQ) system

When further extending the knowledge of extremely low branching ratios, often a high amount of statistics and therefore high rate experiments are required. In order to meet the detection and readout efficiency goals, triggered detector systems which preselect the incoming events live in the detector hardware and software have become inevitable for more and more experiments. The Mu3e detector however cannot use triggers, as the low decay product momentum and thus strongly curved tracks require a complete event reconstruction of the whole detector. As the data collection rises up to 100 Gbit/s in phase I, cumulated in all parts, it [15] has to be achieved by a special *Data Acquisition* (DAQ) system.

It benefits from the fact that to both ends of the experiment there are no detectors and thus, there is enough space for a large stack of readout electronics and data acquisition components, dealing with the high data rate with a relatively high number of FPGAs. It also comes in handy that in case of Mu3e, the signature of the searched event is quite unique and must only be distinguished from a few processes like the dominant Michel decay or internal conversion. The event reconstruction is performed by high-performance NVIDIA GPUs, which are capable of doing the required geometrical analysis tasks like vertex fitting and track reconstruction fast enough.

The task of creating a digital base clock domain, running at 125 MHz is fulfilled by a specially designed *Clock & Reset Box* (fig. 7). This clock signal is provided to all experiment instances and modified in order to drive the specific subdetectors with the clock frequency they need. The DAQ's front-end consists of 114 *Front-End Boards* (FEBs), collecting measurement data and processing it on an *Field-programmable Gate Array* (FPGA), the *Intel Arria V*.

The FEBs have a specially designed cooling structure, including a heat-conducting pipe and a metal casing and will sit in a metal crate included in the Service Support Wheel.



Figure 7: **DAQ SciTile hardware.** The Service Support Wheel (SSW) hosts the direct readout component of the tile detector, the Front-End Board (FEB). It is connected with an adapter plate and receives the data of the tile readout chips, the MuTRiGs; these data frames are send out via an optical fibre system to the Switching boards (SWBs), which process data packages of several FEBs and prepare them for event reconstruction in the Farm.

As they use an FPGA, they can be reprogrammed using a hardware description language like VHDL or Verilog. It changes the firmware, i.e. the logic physically embedded in the hardware of the FEB. The concept is advantageous over a specific design of an ASIC like the already discussed MuTRiG [13] or a pure software data processing as it combines the high speed of digital hardware logic with the possibility of an easy and (in terms of timing) well-monitored process for reprogramming and improving the system. The FEBs are connected via a passive backplane with a *Detector Adapter Board* (DAB), mapping the ports of the FEB onto the detector-specific output connector (in this case, the already mentioned 40-pin Micro-Coax). On the back-end side, the FEB is connected via a FireFly<sup>5</sup> optical transmission link able to transfer up to 6.25 Gbit/s per data fibre. [15]

The FEBs send out data to a fanout rack which accumulates multiple fibers into one optical connection leading to one of 4 *PCIe40 Switching Boards*, which further processes all incoming signals and stores the data on the *GPU Farm PCs* with a speed of  $4 \times 10$ 

 $<sup>^5\</sup>mathrm{SamTec}$  MTP connector

GBit/s. It was already constructed for *LHCb* readout purposes and comprises an *Intel* Arria 10 FPGA, 48 optical inputs and 3 8-lane PCIe interfaces. As some blocks of the Tile detector firmware were implemented during this work, the focus will be put onto the FEB SciTile-branch of it. It consists of a top entity, merging the tile\_path.vhd entity (representing the tile-specific firmware) together with the fe\_block.vhd common firmware part comprising general purpose entities needed for a standardized communication with the switching boards.



Figure 8: **DAQ SciTile FEB Firmware.** Signals used by the sketched entity (rst\_shift\_block) are provided by the clock and reset box. A register which is accessed by the FEB with specific adress signals can be used for communication with individual firmware blocks. It comprises a read and write signal along with an "enable" signal for both directions.

The tile path block then consists of three major blocks: A slow control unit needed for the communication between MIDAS software commands and the slow control commands for the detectors (e.g., the I<sup>2</sup>C lines on the TMB), the register unit scitile\_reg\_mapping.vhd and the entity mutrig\_datapath.vhd, containing the main readout branch. Fig. 8 shows the implementation of such a firmware entity in the Front End Board firmware.

## 2 Measurements on the Tile Module Board

All chips on the Tile Module Board need to have access to a clock and reset signal, as well as a test pulse line. Ensuring that each of the chips receives these digital signals with the same quality in terms of timing and amplitude, a distribution structure with clock buffer chips is needed. The chosen buffer<sup>6</sup> has two LVDS inputs (i.e. two positive and negative signal lines, whose difference result in the original two signals), mapped onto four LVDS outputs each.



Figure 9: Clock buffer distribution trees. The ideal tree structure a) minimizes the total output skew between different clock signals. As this was not feasible, a serial tree structure b) was implemented, which leads to a high accumulation of output skew to the end of the tree.

It has a propagation delay from 300 ps up to 575 ps with an output skew of 20 ps, meaning that the delay between output signals belonging to the same input signal maximally differ 575 ps from the input signal and internally have a timing error of up to  $t_{\rm SK,o} = 20$  ps. The part-to-part skew, i.e. the maximum time difference between signals coming out of two clock buffers operated under the same condition, has been specified by the vendor to  $t_{\rm SK,pp} = 250$  ps.



Figure 10: CLK/RST/INJ signal distribution. The lower clock buffer U27 distributes the incoming clock and reset signals to three chips, the upper chip U92 is used for the injection signal distribution. The path length differences between all three signals is compensated for by little meandering pathes, so that the skew comes mainly from the buffer chips. [17]

Ideally, a tree like-structure is the best way to achieve a relatively homogeneous timing distribution of these signals. A mid distribution chip sends the incoming signals to both

<sup>&</sup>lt;sup>6</sup>Texas Instruments LMK1D2104 [16]

ends of the TMB where the first and last chip receive all signals roughly at the same time fig. 9. Hence the maximal skew difference of signals would be reduced by about a factor of two, as no signal gets propagated through more than two clock buffer chips.

Though that structure would be the ideal way, it was not feasible for geometrical reasons. Instead, the signals are propagated by a serial connection of four clock buffers: Clock and reset are directed from the micro coax input cable to the two inputs of one clock buffer, located on the bottom side of the TMB between MuTRiG #1 and #2. A second clock buffer located between the same chips receives the injection signal. All three signals are distributed to three chips by the two clock buffers via three of the LVDS outputs of each bank (the lower distributing three clock signals with the one bank and three reset signals with the other, also visible in fig. 10), while the fourth output routes each signal to the next buffer chip input. As the last buffer chip does not need to distribute the signals further, the fourth output of all banks is routed to MuTRiG #12. The distribution hence creates a pattern for the skew of all chips which can thus be grouped into three or in the last case four chips, whose skews are roughly the same up to the different path length from chip to chip. For a synchronizable detector, it is required that the TMB reset and clock lines per chip do not show a skew in the same order of the clock period (i.e. 1.6 ns), but significantly lower.

## 2.1 Signal Propagation Timing

As a first step in classifying and understanding the timing of signals distributed on the TMB, the reset, clock and pulse injection signal line delays were characterized.



Figure 11: **D13000PS probe soldered to a reset termination resistor.** The resistor next to the soldered one terminates the CLK line. As this photo represents an early measurement setup, the TMB was not equipped with a all MuTRiGs.

Right before each MuTRiG, *termination resistors* (fig. 11) for all three lines are placed. At these resistors, there is the possibility to probe the distributed signal. The measurement procedure for that was the following: A clock generator's<sup>7</sup> 10 kHz clock input was routed

 $<sup>^7</sup>$ Silicon Labs Si5338



to an  $Oscilloscope^8$  as a reference signal, the same signal was distributed to the TMB.

Figure 12:  $t_{RST}$ ,  $t_{CLK}$  and  $t_{INJ}$  of a TMB. All three signals show the desired grouping with a spacing of ~ 1 ns delay, which is combined from the signal path length between clock buffers and the internal propagation delay of the buffer chips.

For that purpose, a small adapter PCB was used to minimize errors and to evade building up the complete DAQ chain all at once. It can not only be connected with SMA cables for the three mentioned input signals, but also comprises pins for an I<sup>2</sup>C connection interface, which are all routed to the 40-pin micro-coax. With three types of probes, the signals were taken at the already mentioned termination resistors. First the time differences between the rising edges of these probe signals  $t_{\text{RST, TMB}}$ ,  $t_{\text{CLK, TMB}}$  and the adjacent rising edges of the reference signal  $t_{\text{Ref}}$  were measured with a passive probe and compared for all chips,

$$t_{i} = t_{i,\text{TMB}} - t_{\text{Ref}} \text{ with } i \in \{\text{CLK, RST, INJ}\}$$

$$(2.1)$$

In a second run, the clock and reset signals of a variety of chips were taken by a highly sensitive differential probe, as the measurement was found to be strongly affected by the previous probes bandwidth. As seen in fig. 12, the measurement of the absolute timing delay are in good agreement for both probes, also they show the desired behaviour of the buffer chip's effects. Each clock buffer group is separated by a delay of roughly a nanosecond, composed of the propagation delay of the clock buffer, as well as the delay caused by the signal path between the clock buffers.

#### 2.2 CLK/RST skew

Now that the behaviour expected of the TMB was verified, the measurement in fig. 12 allows to evaluate the skew between the CLK and RST for each chip,

$$t_{\rm CLK-RST} = t_{\rm CLK} - t_{\rm RST} \tag{2.2}$$

 $<sup>^{8}\</sup>mathrm{LeCroy}$  TeleDyne SDA 813ZI with the typical probe system LeCroy D13000PS

The maximum skew between these two signals is mainly composed of the timing error of the clock buffer chips, as the TMB's path lengths were designed to be exact and a manufacturing error can be neglected. The outer boundaries for the skew is determined in the following way:

As the cabling between the clock generator, the oscilloscope and the TMB leads to a systematic skew offset which cannot be quantified for sure, the relative skew of all chips with respect to MuTRiG #0 is taken. As this MuTRiG receives its signals from the same clock buffers as the following two MuTRiGs, their maximum skew interval is bounded by the output skew of the buffer and thus one would expect the skews of MuTRiG #1 and #2 to be inside this interval:

$$t_{\text{CLK-RST, 1-2}} - t_{\text{CLK-RST, 0}} \in [-t_{\text{SK,o}}, t_{\text{SK,o}}] = [-20 \text{ ps}, 20 \text{ ps}]$$
 (2.3)

As the skew propagates through when the signal gets passed to the next clock buffer chip, the resulting maximum accumulated skew adds up by a total of 20 ps. One should however notice, that the output skew is now taken with respect to the input signal of the specific clock buffer chip and not a specific input like in the case before. The maximum skew therefore adds up an additional 20 ps in one direction (higher or lower) between the clock and reset signals going out.



Figure 13:  $t_{CLK-RST}$  of a TMB. The big error bars of the PP008 measurement are probe-dominated due to the poor frequency bandwidth. The differential probes however reveal a more narrow distribution of errors and time differences, which fit inside the expected maximum skew determined by the clock buffer tree structure.

In order to symmetrize this, the propagation delay (which is not important for the parametrization anyways) can be shifted and thus results in an augmentation of the maximum accumulated skew interval of 10 ps in each direction. For the next clock buffer chip, examplewise the interval in which the skew should lie looks like this:

$$t_{\text{CLK-RST, 3-5}} - t_{\text{CLK-RST, 0}} \in [-t_{\text{SK, o}} - 0.5t_{\text{SK, o}}, t_{\text{SK, o}} + 0.5t_{\text{SK, o}}] = [-30 \text{ ps}, 30 \text{ ps}] (2.4)$$

The same is of course true for the last two clock buffer distributions. The probed data in fig. 13 is obtained by eq. (2.2), using the difference of the absolute signal delays for both signals presented in the part just discussed, again plotted for both probes used. The error of the measured skew is obtained by a gaussian error propagation.

For an exact parametrization of the resulting skew, the values measured with the DS13000PS probes were used. The band in which the clock and reset skew is actually located for the measured case can be obtained from the highest and lowest datapoint, getting the band edges' error from the quadratically added error of both datapoints, as again only the difference of both error-equipped quantities is taken.

$$t_{\text{CLK-RST, band}} = t_{\text{CLK-RST, 5}} - t_{\text{CLK-RST, 9}} = (41.41 \pm 7.69) \,\text{ps}$$
 (2.5)

The error of the measurements characterizes the combined jitter<sup>9</sup> of both signals. The relatively low value of around 7 to 8 ps was measured with a low frequency reference signal. This quantity therefore should rather be seen as a lower boundary for the jitter between both signals in the actual experiment, where the quality of the 625 MHz clock signal's edges is likely to be different.

#### 2.3 CLK/INJ skew

A common test procedure used for the tuning of detector readout electronics is to inject test signals, which show a uniform and well tuned behaviour. For the case of the Mu3e Tile Detector, a 10 kHZ injection testpulse is created in the FEB and routed to the TMB. It is distributed by a second clock buffer of the same fabrication of the one already discussed before.

The aim of this subsection will be to determine the skew of the injection signal with respect to the clock. For that purpose, a maximum skew error in the same fashion as for the reset signal was calculated. The propagation of the upper and lower boundaries is technically the same, still the value to plug in for the maximum skew is different as both signals are distributed by different chips operating under the same conditions. In that case, the part-to-part parametrization  $t_{SK,pp}$  for the buffer device needs to be included as well. Thus the maximum accumulated skew is bound by  $[-1 \cdot 250 \text{ ps}, 1 \cdot 250 \text{ ps}]$  for the first three chips, while the next three chips should have a skew inside  $[-1.5 \cdot 250 \text{ ps}, 1.5 \cdot 250 \text{ ps}]$  etc.

The measured differences are taken from the absolute delay measurements in section 2.1, with a Gauss-propagated error.

As before, the maximum skew can be parametrized as evaluated on the board with a band, ranging from the lowest to highest skew value. Hence the error propagation of the last

<sup>&</sup>lt;sup>9</sup>standard deviation of a signals rising edge time stamp, thus only a useful information if it is measured periodically, or as in case of the reset multiple times

calculation is used.

$$t_{\text{CLK-INK, band}} = t_{\text{CLK-INK, 0}} - t_{\text{CLK-INK, 12}} = (249.09 \pm 93.87) \,\text{ps}$$
 (2.6)

It is worth to mention that the injection measurements are not crucial to the synchronization for the detector, as it does not matter when the pulses are coming into the MuTRiG, as long as the synchronization with clock and reset works properly.



Figure 14:  $t_{CLK-INJ}$  of a TMB. The skew band is representing the time interval which is covered by the experimentally determined time differences. As expected from the fact that the part-to-part skew of the clock buffers is much higher than the internal output skew, this skew band exceeds the size of the CLK/RST skew band by a factor of 6 (~ 40 ps compared to ~ 250 ps). Nevertheless, the measured time differences are located inside the maximal skew.

As probes with a lower bandwidth of just 1 GHz were used, jitter estimation would not be possible to the best conditions in this case. Still, it is verified that the injection lines are well inside the maximum accumulated skew obtained solely by the buffer chip parametrization, meaning that the timing accuracy, at which the signals are distributed reaches the hardware limit.

## 3 Clock/Reset Alignment Firmware

#### 3.1 Motivation

In order to understand what makes up a synchronized device, a basic understanding of the timing properties of the detector is necessary. As stated previously, the clock signal defines a local timing reference for each MuTRiG, but still arrives with a certain (slightly deviating) skew compared to the reset signal. Due to different routing lengths, component delays, as well as differences in the fibre length between FEB and Switching Board, such a skew is mostly fixed by the hardware components.

The key function of the reset signal is to define the first timestamp. It is raised at run start by the *Clock & Reset Box* when it is in the sync state and switched from '1' to '0' when transitioning to the running state. The renewed change of reset is recognized at the next clock's rising edge by the MuTRiG and the corresponding chips local clock cycle is thus defined as the first timestamp. The MuTRiG is considered to be synchronized to the rest of the system if this process is executed multiple times and the first timestamp is always assigned to the same clock cycle with respect to the reset signal.

In order to determine such a timestamp, a reference timeframe is needed, which can be obtained by the first timestamp of any other MuTRiG. The goal of this thesis is to evaluate the synchronization behaviour of a whole module, thus timestamp assignments are tested for 12 pairs of MuTRiGs covering a total of 13 chips. With that in mind, first it should be clarified what exactly would lead to such a clock shift.

**Flip-Flops** For that, the introduction of a critical digital logic component is important: In all cases where a signal is to be synchronized to a clock, the use of so called *Flip-Flops* is needed.



Figure 15: Schematic and signal processing of a Flip-Flop. The two inputs, D and CLK are combined to determine the output. Q is set to the input of D if and only of the Flip-Flop sees a rising edge of CLK.

It is a basic digital logic circuit register comprising two inputs, D and CLK as well as an output Q. In the special case that the reset signal from the FEB (which corresponds to D in fig. 15) is in a timing regime near the rising edge of the clock of a MuTRiG, a *Sample-Hold-Violation* can occur: D is changing either inside the sample time of the Flip-Flop (the time before a clock edge needed to safely determine D) or in the hold time (the time in which D would still influence the outcome of Q), leaving the Flip-Flop in a meta-stable state and thus leading to an unpredictable outcome at Q.

As the reset would then not be well-defined, it fails to be synchronized in the right clock cycle, and the chip would be missynchronized by  $\pm 1.6$  ns.

To make this working at all times, the reset for a chip (or even for all chips on one TMB) could be shifted towards a specific time point within the 1.6 ns clock cycle. However, this would fail if the reset skew of all chips would vary in the same order of the clock period because then a general delay of the reset received by all chips would not work. Luckily this is not the case due to the components chosen and measured in the section before. The ability to shift the reset signal to different phases with respect to the local clock, as well as evaluate the synchronization behaviour is therefore relevant.

#### 3.2 Intel Quartus IP Cores

The only possibility in order to prevent such a sample-hold violation is to shift the RST of all MuTRiGs to a point in the middle of the clock cycle. This is guaranteed first and foremost by a relatively homogeneous distribution of CLK and RST signals relative to each other, ensuring that the timing window in which it deviates is much smaller than a clock cycle.

For each module, the shift can then be solved by a unit delaying the global reset signal for each FEB/TMB combination to a point in the clock cycle where no MuTRiG undergoes sample/hold-violations at any point in time. Such a unit was implemented in the board design of the front-end FPGA, the Intel Arria V. Intel (formerly Altera) provides the possibility to access dedicated FPGA blocks as well as configuration interfaces controlling these blocks via so called IP (*Intellectual Property*) cores.

Using the Intel Quartus simulation and synthetization software, the FPGA firmware can be reprogrammed with own blocks and IP cores based on hardware description, which are connected in the FPGA fabric during the synthetization process. The latter parts can be designed via a GUI showing all options of the specific FPGA entity. For this use case, IP cores alone are not sufficient to completely achieve the desired reset shift, as it lacks a way to communicate with the rest of the DAQ and especially with the register specifying the reset shift setting, as the IP core itself needs the information of the shift setting as a serial digital configuration input.

For the desired functionality, the built-in delay chain of the Arria V with a connected configuration IP block, as well as a double-data rate unit will be needed. As these entities must be connected to the rest of the FEB's firmware part and especially to the register specifying the reset shift setting, a mapping and data conversion unit is needed between the register input and the IP cores. The function and connection of all blocks will be specified in the following subsections.

#### 3.3 Finite State Machines

In order to transform the reset shift setting from a vector input coming from the backend to a serial output leading to the IP core, a *Finite State Machine* (FSM) was implemented in the SciTile/SciFi FEB firmware. In the following part, this finite state machine used to enable and configure the reset shift IP core will be explained. Finite state machines are widely used in technical applications, mostly when a specific task or process has to be executed sequentially or with the usage of few signals from outside, e.g. traffic lights or elevators. Its behaviour is defined by a list of states  $\Sigma = (S_1, \ldots, S_n)$  and an input alphabet  $\Delta = (D_1, \ldots, D_m)$ , between which it can change depending only on the input signal  $D \in \Delta$  and its current state  $S \in \Sigma$ . Thus, one can define a transition function  $T : \Sigma \times \Delta \to \Sigma$ , which takes a state and a symbol from the input alphabet and, along with the starting state  $S_0 \in \Sigma$  characterizes the FSM completely.

**Shift Registers** As a basic example for a finite state machine widely used as a digital logic circuit is the *shift register*. It is the main ingredient used for the already mentioned conversion between a vector input and a serial output.



Figure 16: **Digital implementation of a shift register.** The D input of each Flip-Flop is passed to the next Flip-Flops input at the CLK's rising edge. This means that the input of the shift register needs five clock cycles to get propagated through.

Mathematically, it can be described as a function SR:  $\{0,1\}^n \to \{0,1\}^n$  mapping a n-bit word  $D \in \{0,1\}^n$  to a new n-bit word, using the following mapping<sup>10</sup>:

$$SR(D(i)) = \begin{cases} D(i+1) & i < n \\ D(i) & i = n \end{cases}$$

$$(3.1)$$

Applying this up to n times, the resulting D(0) for step  $i \in \{1, ..., n\}$  is given by

$$\mathrm{SR}^i(D(0)) = D(i) \tag{3.2}$$

which is exactly the desired serialized vector output. Practically, this can be realized with n flip flop registers serially connected (Q of one flip flop to D of the next) and driven by the same clock (fig. 16).

To interpret the shift register as a finite state machine, D is defined as as the state, making  $(0,1)^3$  the space of possible states, which is explicitly consisting of  $\Delta \equiv (0,1)^3 = \{(000), (001), (010), (011), (100), (101), (110), (111)\}$ . The input alphabet is the boolean domain  $\Sigma \equiv (0,1)$ . By defining the transition function such that

$$T: (0,1)^3 \times (0,1) \to (0,1)^3, \quad (D = (D_1 D_2 D_3), S) \mapsto (S D_1 D_2)$$

<sup>&</sup>lt;sup>10</sup>It should be noted that this definition is not unique, as there are many different use cases and thus slight differences in the definition

and assuming that the initial state is  $S_0 \equiv (000)$ , all possible outcomes of words (and therefore) states of the shift register are covered by the description.



Figure 17: Mealy-Moore Diagram of a 3-bit Shift Register. All possible states (blue boxes) can be reached from the initial state (000) with the input of one or multiple boolean signal inputs (orange boxes/arrows).

Turning this into a so called *Mealy-Moore Diagram* (fig. 17), the FSM can be described graphically to overview the working principle a little better. For that, all states are written out as boxes interconnected with arrows, on which the signal needed for a transition to the adressed state is written.

#### 3.4 Architecture

In order to properly describe the processes inside the reset shift block, a nomenclature for signals and states is introduced. As visible in fig. 18, all inputs are marked with an  $i_{,}$  all outputs with an  $o_{,}$  and all internal signals with an  $s_{,}$  prefix. The finite states of the FSM are marked by the prefix  $fs_{,}$ . The main signal which will be shifted by the entity is  $i_{,}d$ , which consequently comes out as  $o_{,}d$ .



Figure 18: Reset shift block structure. The finite state machine configures the main reset shift entity; The double data unit, along with a Flip-Flop-like unit before, is used for a 180° phase shift. All other outputs of the FSM are used for configuration of the delay chain, sitting in the last position and delaying the signal with a higher time granularity.

It first gets passed to a double data buffer unit altddio\_out, which sends the signal out to the delay chain sd1, a Quartus IP core. It is capable to shift its input by a fixed increment  $\delta T$  (later determined to ~ 22 ps), which comes in handy as it provides a very fine shifting granularity, even though it might not be needed. Both the double data buffer and the delay chain are direct shifters of the reset signal, which are configured with a specific combination of signals. This is mainly done by the FSM, which directly steers the altddio\_out and passes the configuration data for the delay chain on to the delay chain configuration IP core, ioconfiga.

#### 3.4.1 Phase Shift Logic

As the delay chain configuration uses a 5 bit word input, it is capable to shift the reset up to  $2^5 = 32$  times the increment  $\delta T$ , meaning it can cover ~ 700 ps with all delay chain settings which is almost half a clock cycle. In order to allow shifts over a whole clock period, the double data unit was implemented, as it enables the system to shift the reset exactly half a clock cycle. For this, the leading bit of the FSM configuration input word is used (meaning that the half-cycle-shifts correspond to the settings 32 to 63). As can be seen from the signal chart below, if the high input of the DDIO is delayed a whole clock cycle (caused by the enabled flip flop), its output will be delayed by half a cycle like desired (see fig. 19, bottom case).

The configuration data for the FSM, which will decide how far exactly the signal will be shifted is a six-bit-wide logical vector i\_cdata(5 downto 0). The highest bit, i\_cdata(5)

is not used by the FSM but assigned to one of its outputs, o\_datashift. It is used as a logical parameter for the double data unit input, as explained later. The finite state machine output, the converted i\_cdata(4 downto 0) configuration data comes out serially (o\_cdata) as already stated. The other outputs of the FSM are necessary for a proper configuration of the delay chain, which will be shown later.



Figure 19: **DDIO half clock cycle shift.** Using the same input for the high and low data of the double data unit buffers the data half a clock cycle (upper diagram). If the high input is delayed one clock cycle, the output is delayed an additional half (lower diagram) by shifting an additional flip flop right before the DDR unit driven with the 625 MHz clock.

The entity is running on two clock domains, as the 625 MHz data clock would be too fast for the configuration units. Thus, a clock divider fed by an additional 156.25 MHz clock provides the FSM and ioconfiga with a clock of the frequency  $\frac{156.25 \text{ MHz}}{5} = 31.25 \text{ MHz}$ .

#### 3.4.2 FSM Working Principle

The finite state machine comprises the states {fs\_idle, fs\_rec, fs\_send, fs\_update}. After every configuration of the delay chain or triggering of a global reset signal, it returns back into the idle state. When it receives a starting signal in two adjacent clock cycles, it makes a transition into the receive state. The condition of two clock cycles improves the robustness against instabilities in the start signal.



Figure 20: Run procedure of the finite state machine. The data received in this case is the input "10110", which one can see in the configuration signal during the send state phase. The conditions for state transitions are usually a specific counter value or a start symbol.

The FSM will then assign the internal configuration vector s\_cdata(4 downto 0) to the last 5 bits of the input i\_cdata(5 downto 0).

As this takes only one clock cycle, the FSM will transition again and send the data out serially with the help of a shift register in the following four clock cycles. As stated in section 3.3,  $s_cdata(0)$  can be assigned as the configuration data output  $o_cdata$ . The shift register is terminating at n = 5, a counter  $s_ccnt$  was implemented which is set to zero when returning to the idle state, from which the FSM derives when to stop via the condition  $s_ccnt = 5$  (fig. 20).



Figure 21: Mealy-Moore Diagram of the Entity. All four states are marked by blue boxes. The transition function takes the reset, and start signal, as well as specific internal counter values as valid arguments.

As afterwards, the delay chain configuration needs 10 clock cycles to update, the state machine changes into the update mode, but stays there for 11 clock cycles. Consequentially, the counter used for the shift register is reset to 0 at the transition to the update state and the termination condition  $s_ccnt = 11$  is introduced. In the last clock cycle of this state, the signal  $s_cupdateout$  is set to one which is again conditionally coupled to the counter. This signal indicates the delay chain that the newly sent configuration is valid. While transferring back to the idle state, the counter and all output signals are reset to '0'.

## **4 Reset Shift Measurements**

The following part will classify the behaviour of the reset shifting unit regarding the linearity of the delay chain, as well as the overall synchronization behaviour of the Tile detector.

This part will classify the behaviour of the reset shifting unit. The measurement procedure hereby was the following: The injected test pulse with a frequency of 100 kHz which was created in the FEB is routed to the TMB. As this pulse is shaped differently compared to the analog SiPM pulses normally processed in the final experiment, the readout ASICs are configured in a slightly different way [13]. Using an automation script of the MIDAS DAQ software, a data taking run is started with the injection pulse enabled. The script is sending a fixed amount of reset signals whenever the DAQ backend is able to do so (i.e. when a reset command of the software is leading to a reset pulse in the FPGA which is sent to the MuTRiGs etc.). If the number of triggered resets specified at the start of the script is reached, the delay chains reset shift is increased by  $2\delta T$ . Thus, a variety of 32 settings (0 to 62 with a step size of 2) is scanned and for each setting, the timestamp differences of incoming hits in specified channelpairs are read out. Triggering the reset a couple of thousand times makes sure that the statistics of the timestamps received is dominated by the reset timing with respect to the clock cycle. In the oscilloscope, it was verified that the point is almost the same (up to a jitter of  $\sim 8 \,\mathrm{ps}$ ), meaning that the distribution of the data timestamp in fine counter bins should be a narrow peak, ideally centered at zero. The deviation from zero then represents the difference of the clock/reset skews of both chips.

#### 4.1 Delay Chain Behaviour

In a first reset measurement procedure, the time difference between the leading edge of CLK and RST of MuTRiG #2 was taken for 64 settings. These values were measured and tracked by the oscilloscope with the help of the 13 GHz active differential probe soldered to the termination resistors. It should be noted that not every reset is recorded by the trigger, leading to a small fluctuation of the number of data points per setting. By average, 460 points per setting were taken, resulting in a total of 29492 resets.



Figure 22: Reset shift setting applied with  $4\delta T$  difference. The photo was taken in the oscilloscope<sup>11</sup> with a persistance setting, which gives the possibility to overlay multiple oscillator frames.

As the dataset tracked over the whole measuring process just consisted of the number of the measurement and the clock and reset time difference, it was saved by the oscilloscope without a mapping, which datapoint matches which setting. The corresponding setting therefore had to be assigned by a ROOT analysis.



Figure 23: 64 Delay Chain Settings applied. The total measuring procedure has been tracked as single leading edge delays stored in a single diagram of the oscillator. With the help of a script, all settings have been identified with a filtering process. For that, a specific number of measurements was combined into a profile histogram (i.e. a collection of datapoints, consisting of the mean and standard deviation of all respective cumulated points).

From fig. 23 one can already optically deduce the linearity of the delay chain. Also, a jump between setting 31 and 32 is visible indicating the 180° phaseflip of the double data unit. The time increments were determined with least-squares fits for both halves of the clock cycle. They evaluate to  $\delta T(0 \text{ to } 31) = (22.17\pm0.03) \text{ ps}$  and  $\delta T(32 \text{ to } 63) = (22.42\pm0.04) \text{ ps}$ , with a difference between setting 31 and 32, i.e. the difference of the offset parameters of both fits of  $(84.69 \pm 2.03) \text{ ps}$ . Comparing this to the total length of the clock cycle, a coverage of roughly 89% is reached by the 22 ps steps of the delay chain.

The error of each point is again reflecting on the jitter quantity already observed in the TMB signal discussion. It fluctuates between 6.41 ps and 8.54 ps and was determined to be  $t_{j,clk-rst,2} = (7.34 \pm 0.45)$  ps mean/standard deviation. As this does not deviate much from the jitter measured before, it is concluded that a higher frequency clock does not build up significant additional jitter.

#### 4.2 Tile Module Synchronization

This part will classify the behaviour of the reset shifting unit. The measurement procedure hereby was the following: The injected test pulse of the frequency 100 kHz which was

created in the FEB is routed to the TMB. As this pulse is shaped differently compared to the analog SiPM pulses normally processed in the final experiment, the readout ASICs are configured in a slightly different way [13]. Using an automation script of the MIDAS DAQ software, a data taking run is started with the injection pulse enabled. The script is sending a fixed amount of reset signals whenever the DAQ backend is able to do so (i.e. when a reset command of the software is leading to a reset pulse in the FPGA which is sent to the MuTRiGs etc.). If the number of triggered resets specified at the start of the script is reached, the delay chains reset shift is increased by  $2\delta T$ . Thus, a variety of 32 settings (0 to 62 with a step size of 2) is scanned and for each setting, the timestamp differences of incoming hits in specified channelpairs are read out. Triggering the reset a couple of thousand times makes sure that the statistics of the timestamps received is dominated by the reset timing with respect to the clock cycle. In the oscilloscope, it was verified that the point is almost the same (up to a jitter of  $\sim 8 \,\mathrm{ps}$ ), meaning that the distribution of the data timestamp in fine counter bins should be a narrow peak, ideally centered at zero. The deviation from zero then represents the difference of the clock/reset skews of both chips.

As long as no different reset shift setting is applied and the reset is located in the middle of a clock cycle, there is no danger to violate the sample and hold time. If the reset is shifted, there is a chance that the reset of one of the MuTRiGs is transferred into the next clock cycle (and thus also incoming data, compared to what other MuTRiGs see), meaning that a 1.6 ns jump of the timestamps can be registered in the data. For an evaluation of the synchronization of the TMB, two datasets were used:

The test pulse is sent by the Front End Board with a repetition rate of 10 kHz while resetting the MuTRiGs constantly with the help of the DAQ software. The reset rate, i.e. the average amount of resets per second was in the order of above 100 kHz. The resulting pair-wise timing distribution between chips gives information about the synchronization of both MuTRiGs with respect to each other.

In the final experiment though, the test pulse synchronization cannot be used during data taking, meaning that the synchronization monitoring of the TMBs is to be tested with physical particle hits. Thus, also physical data measured at the DESY II Testbeam facility was used to determine the synchronization behaviour. In order to have a comparison as close as possible to the first test method, the module was aligned in direction of a 2.2 GeV electron beam, such that the particles hit several tiles along the z-direction.

The beam measurement still has some systematic errors which have to be pointed out and explained, illustrating that an in-situ synchronization with the help of physical hits is much more complex than an injection measurement under "lab-like" conditions.

During two weeks of data taking, the analysis and data acquisition of the tile detector was improved significantly, still there remain some bugs in hardware and software: It was discovered that the matrices read out by the first two MuTRiGs started to yield a lot more hits than the rest of the module, which seems like a systematical bug in the data acquisition mapping data to the wrong chips. Pure hardware bugs can be identified from the hitmap (fig. 24) too: One chip, as well as one channel per chip were not functional due to incorrect routing or ground connection.

MuTRiG #6 shows the wanted jumping behaviour, but the information is overlayed with a lot of noise which is most likely due to an unlocked PLL. The behaviour was already observed in a previous pulse injection measurement and therefore its data will not be used to determine the final reset setting. Another frequently occurring DAQ bug which can gladly be filtered out or ignored is the accumulation of timestamp differences which are exactly 0. It seems to be the case, that timestamps are assigned wrongly if the Front-End Board buffer is overflowing and thus not able to digest the data coming in.



Figure 24: **DESY Testbeam Hitmap.** The beam was directed orthogonally onto the module right at the tiles corresponding to MuTRiG #9. The periodically occurring Channel 0 bug for all chips at  $\varphi = 3$ , as well as the missing MuTRiG #11 is visible.

Also, as the DESY ring is injecting electrons into the larger PETRA III ring every couple of minutes, there will be some periods in the reset shift scan which show a significantly lower amount of data. To simplify the data analysis of the synchronization measurements, the difference of the timestamps of a channel of MuTRiG #0 compared to the same channel on all other chips was taken and scanned for the already mentioned characteristic timejumps. These channel pairs are specified in the Mu3e analyzer configuration dataset and can be changed for every new analysis, be it online or offline.

- The plot is filled hit by hit by the analyzer.
- Comparing the timestamps of two hits, for each new entry the previous hits' attributes are saved into a variable, which can be used in the function filling the histogram.
- This function fills in the timestamp if the current and previous analyzer entry are a) in the same  $\sim 10 \,\mu s$  timeframe and b) the channels of hit and previous hit are the ones specified by the channel pair variable.
- The analyzer plot fig. 30 then contains the timestamp differences with a granularity of 50 ps for all channel pairs specified within one run (which corresponds to one setting in this measurement).
- All plots are then converted into one histogram per channel pair, featuring the timestamp distribution plotted against the skew setting. Such a conversion is done

using a ROOT script, which is also capable of y-projecting the timestamp-vs-run output files and fitting its distribution to determine the original and shifted time differences.

#### 4.2.1 Pairwise Chip Comparison

In case of an injection signal (which is cleaner and thus a good testing opportunity under lab conditions), the timing distribution is expected to show two peaks with a spacing of 1.6 ns. The width of the peaks is mainly coming from the combined clock and reset jitter of both chips compared.

The identification of the settings, where a sample-hold violation occurs will be illustrated by the following example, where fig. 25 is characterizing the synchronization of chips #0and #1. The calculation for the relevant time parameters of the synchronization is done in the following way:



Figure 25: Synchronization measurement of MuTRiGs #0 and #1. On the x-axis, synchronization settings 0 to 62 (with a step size of 2) are scanned and their timestamps are filled along the y-axis.

The timestamp values are centered around  $\Delta t_{\text{initial}} = (-206 \pm 22) \text{ ps}$ , which were obtained with a Gaussian fit to the left peak. It is identified as the main, synchronized relation between the two MuTRiGs. The timestamp difference jumps at setting 50 to  $\Delta t_{\text{jump}} =$  $(1400 \pm 21) \text{ ps}$ . This jump results from the MuTRiG #0 RST signal being associated with the next CLK cycle, while the RST of MuTRiG #1 is still within the current clock period. At setting 56, the RST of #1 is shifted outside of the sample/hold area into the next clock cycle, suggesting that the RST of #1 is delayed by roughly  $(56 - 50) \cdot \delta T = 0.18 \text{ ns}$  with respect to RST #0.

Moreover, the difference between both peaks evaluates to  $T_{CC} = \Delta t_{\rm in} - \Delta t_{\rm jump} = (-1606 \pm 30)$  ps, confirming that the behaviour seen is indeed the CLK cycle change. It is concluded that all settings between 0 and 50, as well as from 58 to 64, shift the reset such that the phase between the local clock and the reset's rising edge leads to a stable synchronization. Interestingly, there is no sign of the metastable behaviour of the MuTRiG reset circuitry: It would lead to an ambiguity of the timestamp within one reset setting, so that both peaks would be visible within the same x bin. As only every second reset setting was measured, the sample-hold time interval hence must be smaller than the the time covered

by two reset settings, which is around 45 ps.

For the beam method, an example channel pair was picked too for a comparison of the quality between both datasets.



Figure 26: Beam Synchronization Measurement of MuTRiG #0 and MuTRiG #2. Again, the timestamps were filled against the synchronization setting. The right plot shows the projection upon the y-axis, which visualizes the two course counter bins as peaks with 1.6 ns difference.

It is visible from fig. 26, that the physical hits fluctuate more in terms of timing than the injection measurements. The unsecurities in the data is mainly due to the fact that the analog readout of the SiPMs is not timewalk-corrected, thus the natural spread around the timestamp is quite dominant. As this is the case for the hit in both the first and the second channel specified, an overlap between both peaks in the projection is visible. Still, there remains an unclear behaviour which was obtained for a lot of MuTRiG pairs: Almost every chip pair monitored ran into the same characteristic 1.6 ns jump in reset shift setting 42, switches back to the original time difference in the next setting and then jumps again, staying there until the reset gets shifted far enough to be in synchronization again. As all signals are digitally and therefore consist of very high modes in the frequency domain, a termination or filtering effect in the signal cables can affect the rising edge waveform of the reset and clock crucially. If this is the case, the Flip-Flop sampling the reset inside the MuTRiG may chatter back and forth between one clock cycle and the other. While this is still possible, it is unlikely that this happens for so many chips at the same setting, there is another possible explanation for this behaviour. If the configuration vector sent by the FPGA is changed by a disruption at the moment, it is sampled by the delay chain configuration entity, the synchronization entity shifts the reset wrongly and the analysis gives out an unexpected 1.6 ns jump. As the next reset shift setting is applied, the typical order is restored and the time difference is likely to jump again.

#### 4.2.2 Module-Wide Synchronization

So far, only the synchronization of two chips with respect to each other has been discussed. In order to make sure, that a whole module is synchronized, a reset shift setting for the firmware entity is needed that places the reset for all chips safely inside the clock cycle, such that the first timestamp at runstart is well-defined and does not show any fluctuating statistics. This setting is obtained by overlaying all MuTRiG pair reset scans and using, that the safe settings for the whole TMB is the intersection of the safe settings obtained for one chip pair with the method just discussed.



Figure 27: Synchronization Data for Injection and Beam method. The data basis for this plot can be found in appendix C and was compiled manually with a python script.

The safe synchronization setting intervals can be obtained from the compiled data shown in fig. 27. The safe settings obtained from that data are ranging from 4 to 48, while the beam measurement yields a safe settings interval from 8 to 40. If the final reset shift setting would be chosen on the basis that the interval's middle is the optimal setting, this would lead to 26 for the injection and 24 for the beam. While this almost coincides, still comparing the chip pairs for both methods show quite a different behaviour. The beam method is more error prone ond therefore the more unreliable method because of the extra cabling, higher activity and more noise of the total system. Still it mostly gives out safe setting intervals which are safely contained inside the safe settings obtained by the ex-situ method. Also it can be concluded that in both measurements, the resulting safe setting interval has a size which is big enough, meaning that a synchronization of full Tile modules in a setup close to the final experiment can be achieved savely.

## 5 Summary

The Mu3e Tile Detector part is used as the outermost layer (upstream and downstream from the vertex detector) of the Mu3e experiment which is designed to search for the LFV decay  $\mu^+ \rightarrow e^+e^-e^+$ . It will determine the timing information of particles incoming with a high rate to an order of way below 100 ps, while streaming the data out continuously. As the high granularity of the tile matrices and the total hit rate of around 180 MHz needs to be read out and transmitted to the backend, a sophisticated data acquisition system is used. It comprises two stages of FPGAs, used to read out all subdetectors and transfer the data out with the help of optical fibre technology. Furthermore, the particle hit rate requires a high-frequency clock domain for the readout ASIC, the MuTRiG. It uses a 625 MHz clock for a coarse timestamp assignment. A crucial point to achieve such high frequency clock domains is a stable and precise synchronization.

In a first measurement, the Tile Module Board, which is hosting the tile matrices, 13 MuTRiGs and distributing signals, was characterized. The absolute propagation delay, as well as the pair-wise skew between the clock, reset and pulse injection line, which are all needed for timing characterization of the module, was investigated on. It was found that the absolute propagation delay of a signals matches the design-specific behaviour. The skew between each MuTRiGs local clock and reset (or injection line, respectively) is kept as low as hardware-wise possible. Especially the clock and reset line skews, are in an excellent, very narrow timing band of  $(41.41 \pm 7.69)$  ps when distributed over a whole TMB. Still, also the band of the clock and injection line skew lies well within the hardware limit obtained by a parametrization of the signal distribution tree: The signals maximally deviate  $(249.09 \pm 93.87)$  ps, where most of the uncertainty comes from the quality of the measurement probes used.

As all MuTRiGs in the detector need to be synchronized in the 625 MHz clock domain, a module wide solution for well-defined time alignment on the front-end FPGA layer was chosen. The synchronization is achieved by delaying the detector-wide reset signal which determines the first timestamp at runstart. This delay can be adjusted in small steps to find a stable operating point with the help of a monitoring measurement, which was proven to be capable to determine the ideal reset shift setting with two measurement methods. While a measurement using a digital pulse signal injected into every MuTRiG showed very reliable and plausible data, a comparison measurement at the DESY II testbeam revealed some differences from the first method. The final result still looks promising and produces almost the same output as the ex-situ injection measurement: While for the beam measurement, a 32-settings wide reset shift interval was determined, the injection measurement led to a reset shift settings interval, which is 44 settings wide, corresponding to  $\sim 700$  ps for the beam method and 970 ps for the injection. The beam interval is a proper subset of the injection interval and the ideal reset settings obtained only differ by 2 when comparing both methods (26 for the injection, 24 for beam). Seeing that this method already works, while the development and debugging in software and hardware is still ongoing is an important step, as the beam measurement is closer to the monitoring mode of the final experiment. In the final data taking runs, there will even be the additional challenge, that the geometry in which the particles will be hitting the detector will rarely be coincidental and more random, as they are decay products and not particles in a focussed beam.

A possible future method to determine hits giving out helpful synchronization info would be a live analysis using cluster identification [18]. With that method, the search for particles which traverse two tiles adjacent on the module, yet belonging to two MuTRiGs, could be achieved. As their timestamp difference would most likely be in the order of 100 ps quite reliably, a collection of hits of this type can be used to build up statistics needed to get reliable information of the synchronization status.

# Appendix A Hardware Photographs





Figure 28: FEB Crate with two FEBs and  $\,$  Figure 29: Tile detector testbeam setup a connected DAB  $\,$ 

# Appendix B Timestamp Analyzer Diagram



Figure 30: Analyzer timestamp differences plot

# Appendix C Complete synchronization analysis of MuTRiG pairs

Comparison of the synchronization MuTRiG pairs side by side. Channel numbering differs as the same module was plugged into different FEB links at the DESY Testbeam.





 $\Delta$ (Ch421-Ch517) Beam





 $\Delta$ (Ch421-Ch613) Beam





∆(Ch421-Ch709) Beam



 $\Delta$ (Ch5-Ch389) Injection

 $\Delta$ (Ch421-Ch805) Beam



## References

- [1] A. Loreti et al., "Technical design report for the phase i mu3e experiment," 02 2021.
- [2] G. Hernández-Tomé, G. López Castro, and P. Roig, "Flavor violating leptonic decays of τ and μ leptons in the Standard Model with massive neutrinos," *Eur. Phys. J. C*, vol. 79, no. 1, p. 84, 2019, [Erratum: Eur.Phys.J.C 80, 438 (2020)].
- [3] U. Bellgardt *et al.*, "Search for the Decay mu+ —> e+ e+ e-," *Nucl. Phys. B*, vol. 299, pp. 1–6, 1988.
- Y. Fukuda et al., "Evidence for oscillation of atmospheric neutrinos," Physical Review Letters, vol. 81, no. 8, pp. 1562–1567, aug 1998. [Online]. Available: https://doi.org/10.1103%2Fphysrevlett.81.1562
- [5] K. Kleinknecht, "Cp violation and k decays," Annual Review of Nuclear Science, vol. 26, no. 1, pp. 1–50, 1976. [Online]. Available: https://doi.org/10.1146/annurev. ns.26.120176.000245
- [6] A. Blondel *et al.*, "Research Proposal for an Experiment to Search for the Decay  $\mu \rightarrow eee$ ," 1 2013.
- [7] A. Baldini *et al.*, "Search for the lepton flavour violating decay  $\mu^+ \rightarrow e^+ \gamma$  with the full dataset of the meg experiment," 2016. [Online]. Available: https://arxiv.org/abs/1605.05081
- [8] M. Hedges et al., "The Mu2e experiment Searching for charged lepton flavor violation," Nucl. Instrum. Meth. A, vol. 1045, p. 167589, 2023.
- [9] I. Perić, "A novel monolithic pixel detector implemented in high-voltage cmos technology," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 2, pp. 876–885, 12 2007.
- [10] H. Klingenmeyer, provided through internal communication, 12 2022.
- [11] M. W. J. Köper, "Time alignment of the mu3e tile detector," Bachelorarbeit, Universität Heidelberg, 2022.
- [12] H. Kolanoski and N. Wermes, *Teilchendetektoren: Grundlagen und Anwendungen*. Springer, 2016.
- [13] H. Chen, K. Briggl, P. Eckert, T. Harion, Y. Munwes, W. Shen, V. Stankova, and H. Schultz-Coulon, "MuTRiG: a mixed signal silicon photomultiplier readout ASIC with high timing resolution and gigabit data link," *Journal of Instrumentation*, vol. 12, no. 01, pp. C01043–C01043, jan 2017. [Online]. Available: https://doi.org/10.1088/1748-0221/12/01/c01043
- [14] L. Lauer, provided through internal communication, 12 2022.

- [15] N. Berger et al., "The mu3e data acquisition," IEEE Transactions on Nuclear Science, vol. 68, no. 8, pp. 1833–1840, aug 2021. [Online]. Available: https://doi.org/10.1109%2Ftns.2021.3084060
- [16] "Lmk1d210x low additive jitter lvds buffer," https://www.ti.com/lit/ds/symlink/lmk1d2104.pdf, 11 2022.
- [17] K. Briggl, provided through internal communication, 05 2022.
- [18] E. Steinkamp, "Hit clustering in the mu3e tile detector," Bachelorarbeit, Universität Heidelberg, 2022.

# Acknowledgements

First off, I want to thank Prof. Dr. Schultz-Coulon for the oppurtunity to write my Bachelor Thesis in his group. Also, I am thankful that Prof. Dr. Masciocchi agreed on being my second supervisor.

I am very glad for many interesting conversations and a good time I enjoyed in the group during the past months.

In particular, I want to thank Konrad Briggl for the consequent and very informative support, as well as for many helpful discussions with Hannah Klingenmeyer and Alexander Junkermann.

Last but not least, I would like to thank my family and friends, especially my girlfriend Johanna who supported me during my complete studies.

# Declaration

Herewith I declare, that I wrote this thesis independently and used no other sources and auxilaries than the ones specified.

# Erklärung

Ich versichere, dass ich diese Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe.

Heidelberg, den 09.01.2023

1. Kip