Archive |
|
Abstract
Person ReIdentification is a key and challenging task in video surveillance While traditional methods rely on visual data issues like poor lighting occlusion and suboptimal angles often hinder performance To address these challenges we introduce WhoFi a novel pipeline that utilizes WiFi signals for person reidentification Biometric features are extracted from Channel State Information CSI and processed through a modular Deep Neural Network DNN featuring a Transformerbased encoder The network is trained using an inbatch negative loss function to learn robust and generalizable biometric signatures Experiments on the NTUFi dataset show that our approach achieves competitive results compared to stateoftheart methods confirming its effectiveness in identifying individuals via WiFi signals
Keywords Person ReIdentification CSI Deep Neural Networks Transformers WiFi Signals Radio Biometric Signature
1Introduction
Person ReIdentification ReID plays a central role in surveillance systems aiming to determine whether two representations belong to the same individual across different times or locations Traditional ReID systems typically rely on visual data such as images or videos comparing a probe the input to be identified against a set of stored gallery samples by learning discriminative biometric features Most commonly these features are based on appearance cues such as clothing texture color and body shape However visualbased systems suffer from a number of known limitations including sensitivity to changes in lighting conditions 4 occlusions 6 background clutter 20 and variations in camera viewpoints 12 These challenges often result in reduced robustness especially in unconstrained or realworld environments To overcome these limitations an alternative research direction explores nonvisual modalities such as WiFibased person ReID WiFi signals offer several advantages over camerabased approaches they are not affected by illumination they can penetrate walls and occlusions and most importantly they offer a privacypreserving mechanism for sensing The core insight is that as a WiFi signal propagates through an environment its waveform is altered by the presence and physical characteristics of objects and people along its path These alterations captured in the form of Channel State Information CSI contain rich biometric information Unlike optical systems that perceive only the outer surface of a person WiFi signals interact with internal structures such as bones organs and body composition resulting in personspecific signal distortions that act as a unique signature
Earlier wireless sensing methods primarily relied on coarse signal measurements such as the Received Signal Strength Indicator RSSI 11 which proved insufficient for finegrained recognition tasks More recently CSI has emerged as a powerful alternative 17 CSI provides subcarrierlevel measurements across multiple antennas and frequencies enabling a detailed and timeresolved view of how radio signals interact with the human body and surrounding environment By learning patterns from CSI sequences it is possible to perform ReID by capturing and matching these radio biometric signatures Despite the promising nature of WiFibased ReID the field remains underexplored especially in terms of developing scalable deep learning methods that can generalize across individuals and sensing environments In this paper we propose WhoFi a deep learning pipeline for person ReID using only CSI data Our model is trained with an inbatch negative loss to learn robust embeddings from CSI sequences We evaluate multiple backbone architectures for sequence modeling including Long ShortTerm Memory LSTM Bidirectional LSTM BiLSTM and Transformer networks each designed to capture temporal dependencies and contextual patterns The main contributions of this work are
We propose a modular deep learning pipeline for person ReID that relies solely on WiFi CSI data without requiring visual input
We perform a comparative study across three widely used backbone architectures LSTM BiLSTM and Transformer networks to assess their ability to encode biometric signatures from CSI
We adopt an inbatch negative loss training strategy which enables scalable and effective similarity learning in the absence of labeled pairs
We conduct extensive experiments on the public NTUFi dataset to demonstrate the accuracy and generalizability of our approach
We perform an ablation study to evaluate the impact of preprocessing strategies input sequence length model depth and data augmentation
By leveraging nonvisual biometric features embedded in WiFi CSI this study offers a privacypreserving and robust approach for WiFibased ReID and it lays the foundation for future work in wireless biometric sensing
2Related Work
21Person ReIdentification via Visual Data
In the field of computer vision person ReID has long been of major importance Earlier methods primarily relied on RGB images or videos to track people across camera views Handcrafted descriptors such as Local Binary Patterns LBP color histograms and Histograms of Oriented Gradients HOG were widely used to capture lowlevel visual cues like texture and silhouette With the advent of deep learning Convolutional Neural Networks CNNs became the dominant approach enabling hierarchical spatial feature learning 7 Training strategies like triplet loss crossentropy with label smoothing and center loss were adopted to optimize embedding space separability 5 19 Recent models often integrate attention mechanisms 10 and partbased representations 13 to handle misalignment and occlusion Despite strong benchmark performance these systems rely heavily on highquality visual input and careful manual tuning limiting their applicability in uncontrolled environments
22Person Identification and ReID via WiFi Sensing
Several works have extensively investigated human identification and authentication through WiFi CSI focusing on features such as amplitude phase and heatmap variations 3 Early methods include lineofsight waveform modeling combined with PCA or DWT for classification 15 or gaitbased identification through handcrafted features 18 CAUTION 14 introduced a dataset and fewshot learning approach for user recognition via downsampled CSI representations More recent methods leverage deep learning models to enhance generalization capabilities 16 A recent approach 1 proposed a dualbranch architecture that combines CNNbased processing of amplitudederived heatmaps with LSTMbased modeling of phase information for reidentification However the use of private datasets in such work limits replicability and hinders direct comparison In contrast our study relies on a widely available public benchmark enabling reproducibility and fair evaluation across different architectures
3Method
In this section details about data preprocessing and augmentation together with the proposed deep architecture are presented
31Data Preprocessing
Data extracted from the CSI complex matrix must first be preprocessed to remove noise and sampling offsets to extract meaningful biometric features
311Channel State Information CSI
WiFi transmission relies on electromagnetic waves that carry information from a transmitting antenna TX to a receiving one RX Modern systems adopt MultipleInput MultipleOutput MIMO involving multiple TXRX antennas and Orthogonal FrequencyDivision Multiplexing OFDM a modulation technique that transmits data across orthogonal subcarriers spanning nearly the entire frequency band The integration of MIMO and OFDM enables sampling of the Channel Frequency Response CFR at subcarrier granularity in a CSI matrix The CSI measurement for each subcarrier
kin K
italic_k italic_K
represents the CFR
Hthetagamma
italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT
between the receiving antenna RX
thetainTheta
italic_ roman_
and the transmitting antenna TX
gammainGamma
italic_ roman_
and is given by
Hthetagamma_kHthetagamma_kejangle Hthetagamma_k
italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_j italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
1
where
Hthetagamma_k
italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
denotes the signal amplitude and
angle Hthetagamma_k
italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
the signal phase By collecting the responses across all TXRX antenna pairs a CSI complex matrix of size
ThetatimesGammatimes K
roman_ roman_ italic_K
is formed representing the CFR across all subcarriers in
K
italic_K
312Amplitude Filtering
Signal amplitude represents the strength of the received signal For a subcarrier
kin K
italic_k italic_K
receiver antenna
thetainTheta
italic_ roman_
and transmitter antenna
gammainGamma
italic_ roman_
the signal amplitude
Athetagamma_k
italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
is defined as
Athetagamma_kHthetagamma_ksqrttextrealHthetagamma_k2textimgHthetagamma_k2
italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT squareroot start_ARG real italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT img italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
2
which corresponds to the magnitude of the CSI measurement In this work signal amplitudes are cleaned of outliers using the Hampel filter 2 which identifies outliers based on the median of a local window and the Median Absolute Deviation MAD Given a sequence of amplitude values across
p
italic_p
packets the local window
Wpk
italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT
of size
w
italic_w
set to 5 centered on packet
p
italic_p
is defined as
WpkleftAplfloor w2rfloor_kldotsAplfloor w2rfloor_kAplfloor w2rfloor_kAplfloor w2rfloor_kright
italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
3
textmedianWpkWpk_leftlfloor w2rightrfloor
median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w 2 end_POSTSUBSCRIPT
4
textMADWpktextmedianWpk_itextmedianWpkquadforall i1leq ileq w
MAD italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_i 1 italic_i italic_w
5
where
Wpk
italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT
denotes the vector containing the
w
italic_w
neighboring data packets centered at packet
p
italic_p
sorted in ascending order for the
k
italic_k
th subcarrier An amplitude value is classified as an outlier if its deviation from the local median exceeds a fixed threshold Specifically any value outside the range
textlimit_pktextmedianWpkpmxicdottextMADWpk
limit start_POSTSUBSCRIPT italic_p italic_k end_POSTSUBSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_ MAD italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT
6
with
xi
italic_
set to 3 is considered an outlier and removed
313Phase Sanitization
Signal phase represents the temporal shift of a signal It is calculated as the arctangent of the imaginary and real parts of the CFR
Pthetagamma_ktan1leftfractextimgHthetagamma_ktextrealHthetagamma_kright
italic_P start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_tan start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG img italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG real italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG
7
To remove any possible phase shifts caused by imperfect synchronization between the transmitter and receiver hardware components we apply a standard linear phase sanitization technique The estimated phase
anglehatHf_k
over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
at frequency
f
italic_f
from the CSI measurements is expressed as
anglehatHf_kHf_k2pifracm_kNDelta tbetaZ
over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 2 italic_ divide start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG roman_ italic_t italic_ italic_Z
8
where
Hf_k
italic_H italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
is the actual phase
Delta t
roman_ italic_t
is a time offset from any delay in the signal arrival and reception
beta
italic_
is the unknown phase offset and
Z
italic_Z
is a noise factor Since the delay factor is a linear function in the subcarrier index
m_k
italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
it is possible to estimate the correct phase slope
a
italic_a
and offset
b
italic_b
as
afracanglehatHf_KanglehatHf_1m_Km_1
italic_a divide start_ARG over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
9
bfrac1Ksum_k1KanglehatHf_k
italic_b divide start_ARG 1 end_ARG start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
10
Therefore the calibrated phase
angle Hprimef_k
italic_H start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
for each subcarrier
kin K
italic_k italic_K
can be estimated by subtracting a linear term from the raw phase as
angle Hprimef_kanglehatHf_kam_kb
italic_H start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_a italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_b
11
32Data Augmentation
To enhance model sensitivity and overall robustness against noise or minor signal fluctuations we apply several data augmentation techniques during training These transformations are performed on the extracted amplitude features rather than directly on the raw CSI data For each amplitude entry one augmentation is applied with a 90 probability leaving the remaining 10 unmodified The first augmentation adds Gaussian noise
ntsimmathcalN0sigma2
italic_n italic_t caligraphic_N 0 italic_ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
to the amplitude value
Athetagamma_kt
italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_t
at each time step
t
italic_t
where
sigma002
italic_ 002
simulating realistic signal fluctuations and improving generalization in noisy environments The second augmentation scales the amplitude by a random factor uniformly sampled in
0911
09 11
modeling small variations in signal strength due to environmental or devicerelated factors Finally a time shift is applied by offsetting the amplitude sequence forward or backward by a random integer
tprimein55
italic_t start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 5 5
within a sequence of length
P100
italic_P 100
Any value shifted outside the sequence bounds is replaced with the mean amplitude of the original signal simulating delays or desynchronizations in signal acquisition
33Deep Neural Network Architecture
In the proposed pipeline a DNN is designed to generate a biometric signature from the processed CSI features The architecture is composed of an Encoder module
M_e
italic_M start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT
and a Signature Module
M_s
italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
as shown in Figure 1
Refer to caption
Figure 1Overview of the proposed framework The system takes an input signal eg a person sensing data and processes it through an encoder that extracts meaningful latent representations These features are passed to a signature model that computes a compact signature vector
s
italic_s
To ensure consistency and comparability the output signature is normalized through the
l2
italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
normalization The resulting signature serves as a unique identifier for the individual based on the input signal characteristics
331Encoder Module
The encoder module produces a fixedsize vector that contains human signature relevant information from the provided CSI measurements This module aims at extracting lowdimensional encoding of the highdimensional and sequential inputs including the amplitude or phase extracted from the CSI measurement of the wireless channel while a specific person is present between the transmitter and receiver This work evaluates three types of encoding architectures compatible with sequential data an LSTM encoder a BiLSTM encoder and the encoder part of a Transformer model
1
LSTM Encoder LSTMs capture temporal dependencies in input sequences enabling the model to recognize recurrent patterns The LSTM encoder consists of
l
italic_l
stacked hidden units where the output of the
l_i
italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
th unit is passed to the
l_i1
italic_l start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT
th unit in the hidden layer Dropout layers with probability
p_d
italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
are interleaved between LSTM layers to improve robustness and reduce overfitting during training The final hidden state
Hl
italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT
from the last LSTM layer serves as the encoded output
2
BiLSTM Encoder BiLSTMs are able to capture the correlation between time steps in the input sequence by processing the sequence in both forward and direction This allows the model to capture context from both past and future time steps Similar to the LSTM encoder
l
italic_l
stacked BiLSTM layers with interleaved dropout layers are used to avoid overfitting The last hidden states from both forward
overrightarrowHl
over start_ARG italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
and backward
overleftarrowHl
over start_ARG italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG
passes are concatenated to form the output encoding
Hl
italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT
3
Transformer Encoder The encoder from the Transformer architecture is capable of detecting correlation between different elements in distant time steps in the input sequence The encoder contains
l
italic_l
identical layers each containing a multihead selfattention sublayer and a positionwise feedforward network sublayer Standard and nontrainable sinusoidal positional encodings are added to the input embeddings to retain sequence order information Moreover residual connections and layer normalization are applied after each sublayer A Dropout layer with drop probability
p_d
italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
is used inbetween encoder layers as a regularization technique The output of the final Transformer layer acts as the encoded representation
332Signature Module
The Signature module takes the fixedsize vector output from the encoder module and generates a final biometric signature It consists of a linear layer and a
l2
italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
normalization function The linear layer is a fully connected layer that maps the encoder output to the desired signature
s
italic_s
dimensional space Then a normalization function is applied to regularize and uniform the output vector values to have a unit
l2
italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
norm Therefore normalization ensures that the signatures lie on a hypersphere which facilitates the similarity computations used in the loss function thus speeding up the training phase
34Loss Function
The training phase requires a loss function that facilitates signatures from the same person to be close together in the embedding space and increases the distance of signatures from different people While contrastive loss and triplet loss work on pairs or triplets they might not leverage information from all available negative samples effectively To this aim the pipeline utilizes inbatch negative loss 8 which is widely used in retrieval tasks During training a custom batch sampler is used to construct batches each composed by two list of samples a query list
B_qleftX_irightN_i0
italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT
and a gallery list
B_gleftX_jrightN_j0
italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT
where
X_i
italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
are the CSI measurements and
N
italic_N
is the batch size The ith sample in
B_q
italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
and the jth sample in
B_g
italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT
belong to the same person if and only if
ij
italic_i italic_j
The entire batch with both
B_q
italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT
and
B_g
italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT
is fed into the DNN and consequently the two lists of biometric signatures are computed by the model
S_qDNNleftX_irightN_j0
italic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_D italic_N italic_N italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT
S_gDNNleftX_jrightN_j0
italic_S start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_D italic_N italic_N italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT
As a result a similarity matrix
simqg
italic_s italic_i italic_m italic_q italic_g
of size
Ntimes N
italic_N italic_N
is computed between the query and gallery signatures using cosine similarity Due to the
l2
italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
normalization in the Signature Module this is simplified to the dot product
simqgS_qtextperiodcentered ST_g
italic_s italic_i italic_m italic_q italic_g italic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT
12
In the similarity matrix shown in Figure 2 diagonal elements indicate similarities between each query signature and its corresponding positive gallery signature same person while offdiagonal elements correspond to negative pairs different people We apply crossentropy loss across each row to maximize diagonal positive scores and minimize offdiagonal negative ones For each query
S_qi
italic_S start_POSTSUBSCRIPT italic_q italic_i end_POSTSUBSCRIPT
the softmaxnormalized row is encouraged to peak at the ith position This leads the matrix toward an identity structure promoting separation between individuals and clustering of sameperson signatures
Refer to caption
Figure 2Similarity Matrix example used in inbatch negative loss function
4Experimental Results and Discussion
41Dataset
Experiments are conducted on the NTUFi dataset 16 14 This dataset is created for WiFi sensing applications and includes samples for both Human Activity Recognition HAR and Human Identification HID We utilize only the HID part to evaluate person ReID The dataset collects the CSI measurements of 14 different subjects For each subject 60 samples were collected while they were performing a short walk inside the designated test area The samples were collected in three different scenarios subjects wearing only a Tshirt a Tshirt and a coat and a Tshirt coat and backpack respectively The data we recorded using two TPLink N750 routers The transmitter router contains a single antenna while the receiver one contains three antennas CSI amplitude data were collected across 114 subcarriers per antenna pair and recorded over 2000 packets per sample As a result each sample has a dimensionality of
3times 114times 2000
3 114 2000
The publicly available dataset provides only the amplitude values already extracted from the CSI with no access to the original complex CSI matrices The dataset is predivided into training and test sets containing 546 and 294 samples respectively To allow for evaluation during training a 3fold crossvalidation strategy is employed using an 80 training and 20 validation split within each fold
Table 1Results of each model on the NTUFi test set
Model Rank1 Rank3 Rank5 mAP
LSTM
0777pm 0032
0777 0032
0897pm 0014
0897 0014
0933pm 0005
0933 0005
0568pm 0010
0568 0010
BiLSTM
0845pm 0045
0845 0045
0934pm 0022
0934 0022
0958pm 0013
0958 0013
0612pm 0026
0612 0026
Transformer
mathbf0955pm 0013
bold_0955 bold_0013
mathbf0981pm 0006
bold_0981 bold_0006
mathbf0991pm 0000
bold_0991 bold_0000
mathbf0884pm 0012
bold_0884 bold_0012
Table 2Performance comparison of different models with and without amplitude filtering Metrics reported are Rank1 accuracy and mean Average Precision mAP The results highlight the impact of amplitude filtering on retrieval performance across LSTM BiLSTM and Transformerbased models
Without filter With filter
Model Rank1 mAP Rank1 mAP
LSTM 0777
pm
0032 0568
pm
0010 0755
pm
0038 0587
pm
0018
BiLSTM 0845
pm
0045 0612
pm
0026 0786
pm
0036 0675
pm
0018
Transformers 0955
pm
0013 0884
pm
0012 0930
pm
0025 0851
pm
0035
Table 3Effect of varying packet sizes on model performance Results are reported for LSTM and Transformer architectures across different packet counts 100 to 2000 using Rank1 Rank3 Rank5 accuracy and mean Average Precision mAP as evaluation metrics The table illustrates how performance trends shift with input granularity for both model types
Model Packets Rank1 Rank3 Rank5 mAP
LSTM 100 0805
pm
0050 0918
pm
0029 0939
pm
0022 0597
pm
0002
LSTM 200 0777
pm
0032 0897
pm
0014 0933
pm
0005 0568
pm
0010
LSTM 500 0777
pm
0065 0906
pm
0028 0939
pm
0017 0592
pm
0040
LSTM 1000 0794
pm
0048 0991
pm
0019 0947
pm
0011 0592
pm
0046
LSTM 2000 0799
pm
0029 0915
pm
0019 0943
pm
0013 0579
pm
0028
Transformers 100 0952
pm
0021 0983
pm
0006 0990
pm
0005 0871
pm
0041
Transformers 200 0955
pm
0013 0981
pm
0006 0991
pm
0000 0884
pm
0012
Transformers 500 0937
pm
0020 0976
pm
0012 0984
pm
0011 0840
pm
0033
Transformers 1000 0960
pm
0013 0984
pm
0005 0988
pm
0001 0896
pm
0020
Transformers 2000 0960
pm
0014 0982
pm
0011 0990
pm
0008 0850
pm
0054
Table 4Impact of data augmentation on model performance Comparison of Rank1 accuracy and mean Average Precision mAP for LSTM BiLSTM and Transformer models evaluated with and without data augmentation
Without augmentation With augmentation
Model Rank1 mAP Rank1 mAP
LSTM 0777
pm
0032 0568
pm
0010 0808
pm
0038 0587
pm
0018
BiLSTM 0845
pm
0045 0612
pm
0026 0889
pm
0017 0668
pm
0016
Transformers 0955
pm
0013 0884
pm
0012 0949
pm
0014 0860
pm
0043
Table 5Evaluation of encoder type and layer depth on performance Rank1 Rank3 Rank5 accuracy and mean Average Precision mAP are reported for LSTM BiLSTM and Transformer models with 1 and 3 encoder layers
Model Layers Rank1 Rank3 Rank5 mAP
LSTM 1 0777
pm
0032 0897
pm
0014 0933
pm
0005 0568
pm
0010
LSTM 3 0822
pm
0026 0909
pm
0004 0941
pm
0004 0585
pm
0001
BiLSTM 1 0845
pm
0045 0934
pm
0022 0958
pm
0013 0612
pm
0026
BiLSTM 3 0825
pm
0042 0919
pm
0012 0955
pm
0003 0632
pm
0043
Transformers 1 0955
pm
0013 0981
pm
0006 0991
pm
0000 0884
pm
0012
Transformers 3 0919
pm
0028 0970
pm
0008 0984
pm
0003 0658
pm
0026
42Implementation Details
We train our model using an AMD Ryzen 7 processor with 8 cores 16 virtual cores 64GB RAM and a NVIDIA GeForce RTX 3090 GPU with 24GB of RAM For the models implementation the Pytorch framework has been used Regarding the training process 300 epochs are performed for each model using a batch size of 8 Adam 9 optimizer is used with a starting learning rate of 00001 A StepLR learning rate scheduler decreases the learning rate by a factor of 095 every 50 epochs
43Person ReIdentification Evaluation
To evaluate the performance our ReID model mean Average Precision mAP has been used together with Rankk accuracy defined as follows
textRankkfrac1Nsum_i1Ndeltar_ileq k
Rank italic_k divide start_ARG 1 end_ARG start_ARG italic_N end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_k
13
which provides the probability of finding the wanted subject in the top k most probable labels The results obtained during the tests are shown in Table 1 As demonstrated the model utilizing the Transformers encoder exceeds in performance both LSTM and BiLSTM ones The Transformerbased model achieves a 955 score for the Rank1 metric and an mAP score 884 The selfattention mechanism of Transformer renders it more accurate and robust at capturing the discriminative longrange temporal patterns within the WiFi amplitude sequences relevant for ReID compared to the LSTMbased models
44Ablation Study
Regarding amplitude filtering Table 2 shows that models trained without the amplitude filtering preprocessing step achieved better performance This suggests that the filtering process may have inadvertently removed useful signal variations essential for learning highly discriminative biometric signatures As for data augmentation Table 4 indicates that the applied transformations improved generalization for both LSTM and BiLSTM architectures In contrast the Transformer encoder did not benefit significantly although it consistently outperformed the other two models even without augmentation With respect to packet size Table 3 reveals that LSTM performance remained mostly stable or slightly degraded with longer sequence lengths likely due to vanishing gradient issues and limited context modeling Conversely the Transformer benefited from extended input sequences thanks to its selfattention mechanism that allows efficient modeling of longrange dependencies Only LSTM and Transformer models were evaluated in this experiment due to the increased computational cost associated with longer inputs Finally we compared shallow 1layer and deeper 3layer variants of each encoder in Table 5 The Transformer achieved its best performance with a single layer as deeper configurations led to overfitting and optimization instability For LSTM and BiLSTM models stacking layers resulted in marginal performance gains but introduced slower convergence and reduced training stability These findings reinforce the overall robustness and efficiency of the Transformer encoder within the proposed framework
5Conclusion
In this paper we presented a pipeline to address the problem of person ReID using WiFi CSI The proposed approach leverages a DNN that generates biometric signatures from CSIderived features These signatures are then compared to a gallery of known subjects to perform reidentification through similarity matching We evaluated three encoder architectures LSTM BiLSTM and Transformer on the publicly available NTUFi dataset with the Transformerbased model delivering the best overall performance By applying a unified and reproducible pipeline to a public benchmark this work establishes a valuable baseline for future research in CSIbased person reidentification The encouraging results achieved confirm the viability of WiFi signals as a robust and privacypreserving biometric modality and position this study as a meaningful step forward in the development of signalbased ReID systems
Acknowledgements
This work was supported by the Smart unmannEd AeRial vehiCles for Human l