Archive


← Back

2025-12-16 - The NFC Research Archive - WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding

Title: WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding
Date: 2025-12-16 10:26:41 AM
NFC Podcast: https://nofluffcollection.com/podcasts.php?podcast=thebeacon&title=surveillance_without_cameras
Original: https://arxiv.org/html/2507.12869v1
Archive: https://archive.ph/2U7zq

This entry is part of the NFC Research Archive, a permanent text-searchable copy of original research and source materials referenced in our productions.

Abstract

Person ReIdentification is a key and challenging task in video surveillance While traditional methods rely on visual data issues like poor lighting occlusion and suboptimal angles often hinder performance To address these challenges we introduce WhoFi a novel pipeline that utilizes WiFi signals for person reidentification Biometric features are extracted from Channel State Information CSI and processed through a modular Deep Neural Network DNN featuring a Transformerbased encoder The network is trained using an inbatch negative loss function to learn robust and generalizable biometric signatures Experiments on the NTUFi dataset show that our approach achieves competitive results compared to stateoftheart methods confirming its effectiveness in identifying individuals via WiFi signals

Keywords Person ReIdentification CSI Deep Neural Networks Transformers WiFi Signals Radio Biometric Signature

1Introduction

Person ReIdentification ReID plays a central role in surveillance systems aiming to determine whether two representations belong to the same individual across different times or locations Traditional ReID systems typically rely on visual data such as images or videos comparing a probe the input to be identified against a set of stored gallery samples by learning discriminative biometric features Most commonly these features are based on appearance cues such as clothing texture color and body shape However visualbased systems suffer from a number of known limitations including sensitivity to changes in lighting conditions 4 occlusions 6 background clutter 20 and variations in camera viewpoints 12 These challenges often result in reduced robustness especially in unconstrained or realworld environments To overcome these limitations an alternative research direction explores nonvisual modalities such as WiFibased person ReID WiFi signals offer several advantages over camerabased approaches they are not affected by illumination they can penetrate walls and occlusions and most importantly they offer a privacypreserving mechanism for sensing The core insight is that as a WiFi signal propagates through an environment its waveform is altered by the presence and physical characteristics of objects and people along its path These alterations captured in the form of Channel State Information CSI contain rich biometric information Unlike optical systems that perceive only the outer surface of a person WiFi signals interact with internal structures such as bones organs and body composition resulting in personspecific signal distortions that act as a unique signature

Earlier wireless sensing methods primarily relied on coarse signal measurements such as the Received Signal Strength Indicator RSSI 11 which proved insufficient for finegrained recognition tasks More recently CSI has emerged as a powerful alternative 17 CSI provides subcarrierlevel measurements across multiple antennas and frequencies enabling a detailed and timeresolved view of how radio signals interact with the human body and surrounding environment By learning patterns from CSI sequences it is possible to perform ReID by capturing and matching these radio biometric signatures Despite the promising nature of WiFibased ReID the field remains underexplored especially in terms of developing scalable deep learning methods that can generalize across individuals and sensing environments In this paper we propose WhoFi a deep learning pipeline for person ReID using only CSI data Our model is trained with an inbatch negative loss to learn robust embeddings from CSI sequences We evaluate multiple backbone architectures for sequence modeling including Long ShortTerm Memory LSTM Bidirectional LSTM BiLSTM and Transformer networks each designed to capture temporal dependencies and contextual patterns The main contributions of this work are

We propose a modular deep learning pipeline for person ReID that relies solely on WiFi CSI data without requiring visual input

We perform a comparative study across three widely used backbone architectures LSTM BiLSTM and Transformer networks to assess their ability to encode biometric signatures from CSI

We adopt an inbatch negative loss training strategy which enables scalable and effective similarity learning in the absence of labeled pairs

We conduct extensive experiments on the public NTUFi dataset to demonstrate the accuracy and generalizability of our approach

We perform an ablation study to evaluate the impact of preprocessing strategies input sequence length model depth and data augmentation

By leveraging nonvisual biometric features embedded in WiFi CSI this study offers a privacypreserving and robust approach for WiFibased ReID and it lays the foundation for future work in wireless biometric sensing

2Related Work

21Person ReIdentification via Visual Data

In the field of computer vision person ReID has long been of major importance Earlier methods primarily relied on RGB images or videos to track people across camera views Handcrafted descriptors such as Local Binary Patterns LBP color histograms and Histograms of Oriented Gradients HOG were widely used to capture lowlevel visual cues like texture and silhouette With the advent of deep learning Convolutional Neural Networks CNNs became the dominant approach enabling hierarchical spatial feature learning 7 Training strategies like triplet loss crossentropy with label smoothing and center loss were adopted to optimize embedding space separability 5 19 Recent models often integrate attention mechanisms 10 and partbased representations 13 to handle misalignment and occlusion Despite strong benchmark performance these systems rely heavily on highquality visual input and careful manual tuning limiting their applicability in uncontrolled environments

22Person Identification and ReID via WiFi Sensing

Several works have extensively investigated human identification and authentication through WiFi CSI focusing on features such as amplitude phase and heatmap variations 3 Early methods include lineofsight waveform modeling combined with PCA or DWT for classification 15 or gaitbased identification through handcrafted features 18 CAUTION 14 introduced a dataset and fewshot learning approach for user recognition via downsampled CSI representations More recent methods leverage deep learning models to enhance generalization capabilities 16 A recent approach 1 proposed a dualbranch architecture that combines CNNbased processing of amplitudederived heatmaps with LSTMbased modeling of phase information for reidentification However the use of private datasets in such work limits replicability and hinders direct comparison In contrast our study relies on a widely available public benchmark enabling reproducibility and fair evaluation across different architectures

3Method

In this section details about data preprocessing and augmentation together with the proposed deep architecture are presented

31Data Preprocessing

Data extracted from the CSI complex matrix must first be preprocessed to remove noise and sampling offsets to extract meaningful biometric features

311Channel State Information CSI

WiFi transmission relies on electromagnetic waves that carry information from a transmitting antenna TX to a receiving one RX Modern systems adopt MultipleInput MultipleOutput MIMO involving multiple TXRX antennas and Orthogonal FrequencyDivision Multiplexing OFDM a modulation technique that transmits data across orthogonal subcarriers spanning nearly the entire frequency band The integration of MIMO and OFDM enables sampling of the Channel Frequency Response CFR at subcarrier granularity in a CSI matrix The CSI measurement for each subcarrier

kin K

italic_k italic_K

represents the CFR

Hthetagamma

italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT

between the receiving antenna RX

thetainTheta

italic_ roman_

and the transmitting antenna TX

gammainGamma

italic_ roman_

and is given by

Hthetagamma_kHthetagamma_kejangle Hthetagamma_k

italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_j italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

1

where

Hthetagamma_k

italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

denotes the signal amplitude and

angle Hthetagamma_k

italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

the signal phase By collecting the responses across all TXRX antenna pairs a CSI complex matrix of size

ThetatimesGammatimes K

roman_ roman_ italic_K

is formed representing the CFR across all subcarriers in

K

italic_K

312Amplitude Filtering

Signal amplitude represents the strength of the received signal For a subcarrier

kin K

italic_k italic_K

receiver antenna

thetainTheta

italic_ roman_

and transmitter antenna

gammainGamma

italic_ roman_

the signal amplitude

Athetagamma_k

italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

is defined as

Athetagamma_kHthetagamma_ksqrttextrealHthetagamma_k2textimgHthetagamma_k2

italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT squareroot start_ARG real italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT img italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

2

which corresponds to the magnitude of the CSI measurement In this work signal amplitudes are cleaned of outliers using the Hampel filter 2 which identifies outliers based on the median of a local window and the Median Absolute Deviation MAD Given a sequence of amplitude values across

p

italic_p

packets the local window

Wpk

italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT

of size

w

italic_w

set to 5 centered on packet

p

italic_p

is defined as

WpkleftAplfloor w2rfloor_kldotsAplfloor w2rfloor_kAplfloor w2rfloor_kAplfloor w2rfloor_kright

italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT italic_p italic_w 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

3

textmedianWpkWpk_leftlfloor w2rightrfloor

median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_w 2 end_POSTSUBSCRIPT

4

textMADWpktextmedianWpk_itextmedianWpkquadforall i1leq ileq w

MAD italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_i 1 italic_i italic_w

5

where

Wpk

italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT

denotes the vector containing the

w

italic_w

neighboring data packets centered at packet

p

italic_p

sorted in ascending order for the

k

italic_k

th subcarrier An amplitude value is classified as an outlier if its deviation from the local median exceeds a fixed threshold Specifically any value outside the range

textlimit_pktextmedianWpkpmxicdottextMADWpk

limit start_POSTSUBSCRIPT italic_p italic_k end_POSTSUBSCRIPT median italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT italic_ MAD italic_W start_POSTSUPERSCRIPT italic_p italic_k end_POSTSUPERSCRIPT

6

with

xi

italic_

set to 3 is considered an outlier and removed

313Phase Sanitization

Signal phase represents the temporal shift of a signal It is calculated as the arctangent of the imaginary and real parts of the CFR

Pthetagamma_ktan1leftfractextimgHthetagamma_ktextrealHthetagamma_kright

italic_P start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_tan start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG img italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG real italic_H start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG

7

To remove any possible phase shifts caused by imperfect synchronization between the transmitter and receiver hardware components we apply a standard linear phase sanitization technique The estimated phase

anglehatHf_k

over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

at frequency

f

italic_f

from the CSI measurements is expressed as

anglehatHf_kHf_k2pifracm_kNDelta tbetaZ

over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_H italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT 2 italic_ divide start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG roman_ italic_t italic_ italic_Z

8

where

Hf_k

italic_H italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

is the actual phase

Delta t

roman_ italic_t

is a time offset from any delay in the signal arrival and reception

beta

italic_

is the unknown phase offset and

Z

italic_Z

is a noise factor Since the delay factor is a linear function in the subcarrier index

m_k

italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

it is possible to estimate the correct phase slope

a

italic_a

and offset

b

italic_b

as

afracanglehatHf_KanglehatHf_1m_Km_1

italic_a divide start_ARG over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG

9

bfrac1Ksum_k1KanglehatHf_k

italic_b divide start_ARG 1 end_ARG start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

10

Therefore the calibrated phase

angle Hprimef_k

italic_H start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

for each subcarrier

kin K

italic_k italic_K

can be estimated by subtracting a linear term from the raw phase as

angle Hprimef_kanglehatHf_kam_kb

italic_H start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over start_ARG italic_H end_ARG italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_a italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_b

11

32Data Augmentation

To enhance model sensitivity and overall robustness against noise or minor signal fluctuations we apply several data augmentation techniques during training These transformations are performed on the extracted amplitude features rather than directly on the raw CSI data For each amplitude entry one augmentation is applied with a 90 probability leaving the remaining 10 unmodified The first augmentation adds Gaussian noise

ntsimmathcalN0sigma2

italic_n italic_t caligraphic_N 0 italic_ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

to the amplitude value

Athetagamma_kt

italic_A start_POSTSUPERSCRIPT italic_ italic_ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_t

at each time step

t

italic_t

where

sigma002

italic_ 002

simulating realistic signal fluctuations and improving generalization in noisy environments The second augmentation scales the amplitude by a random factor uniformly sampled in

0911

09 11

modeling small variations in signal strength due to environmental or devicerelated factors Finally a time shift is applied by offsetting the amplitude sequence forward or backward by a random integer

tprimein55

italic_t start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 5 5

within a sequence of length

P100

italic_P 100

Any value shifted outside the sequence bounds is replaced with the mean amplitude of the original signal simulating delays or desynchronizations in signal acquisition

33Deep Neural Network Architecture

In the proposed pipeline a DNN is designed to generate a biometric signature from the processed CSI features The architecture is composed of an Encoder module

M_e

italic_M start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT

and a Signature Module

M_s

italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT

as shown in Figure 1

Refer to caption

Figure 1Overview of the proposed framework The system takes an input signal eg a person sensing data and processes it through an encoder that extracts meaningful latent representations These features are passed to a signature model that computes a compact signature vector

s

italic_s

To ensure consistency and comparability the output signature is normalized through the

l2

italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

normalization The resulting signature serves as a unique identifier for the individual based on the input signal characteristics

331Encoder Module

The encoder module produces a fixedsize vector that contains human signature relevant information from the provided CSI measurements This module aims at extracting lowdimensional encoding of the highdimensional and sequential inputs including the amplitude or phase extracted from the CSI measurement of the wireless channel while a specific person is present between the transmitter and receiver This work evaluates three types of encoding architectures compatible with sequential data an LSTM encoder a BiLSTM encoder and the encoder part of a Transformer model

1

LSTM Encoder LSTMs capture temporal dependencies in input sequences enabling the model to recognize recurrent patterns The LSTM encoder consists of

l

italic_l

stacked hidden units where the output of the

l_i

italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

th unit is passed to the

l_i1

italic_l start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT

th unit in the hidden layer Dropout layers with probability

p_d

italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT

are interleaved between LSTM layers to improve robustness and reduce overfitting during training The final hidden state

Hl

italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT

from the last LSTM layer serves as the encoded output

2

BiLSTM Encoder BiLSTMs are able to capture the correlation between time steps in the input sequence by processing the sequence in both forward and direction This allows the model to capture context from both past and future time steps Similar to the LSTM encoder

l

italic_l

stacked BiLSTM layers with interleaved dropout layers are used to avoid overfitting The last hidden states from both forward

overrightarrowHl

over start_ARG italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG

and backward

overleftarrowHl

over start_ARG italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG

passes are concatenated to form the output encoding

Hl

italic_H start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT

3

Transformer Encoder The encoder from the Transformer architecture is capable of detecting correlation between different elements in distant time steps in the input sequence The encoder contains

l

italic_l

identical layers each containing a multihead selfattention sublayer and a positionwise feedforward network sublayer Standard and nontrainable sinusoidal positional encodings are added to the input embeddings to retain sequence order information Moreover residual connections and layer normalization are applied after each sublayer A Dropout layer with drop probability

p_d

italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT

is used inbetween encoder layers as a regularization technique The output of the final Transformer layer acts as the encoded representation

332Signature Module

The Signature module takes the fixedsize vector output from the encoder module and generates a final biometric signature It consists of a linear layer and a

l2

italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

normalization function The linear layer is a fully connected layer that maps the encoder output to the desired signature

s

italic_s

dimensional space Then a normalization function is applied to regularize and uniform the output vector values to have a unit

l2

italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

norm Therefore normalization ensures that the signatures lie on a hypersphere which facilitates the similarity computations used in the loss function thus speeding up the training phase

34Loss Function

The training phase requires a loss function that facilitates signatures from the same person to be close together in the embedding space and increases the distance of signatures from different people While contrastive loss and triplet loss work on pairs or triplets they might not leverage information from all available negative samples effectively To this aim the pipeline utilizes inbatch negative loss 8 which is widely used in retrieval tasks During training a custom batch sampler is used to construct batches each composed by two list of samples a query list

B_qleftX_irightN_i0

italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i 0 end_POSTSUBSCRIPT

and a gallery list

B_gleftX_jrightN_j0

italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT

where

X_i

italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

are the CSI measurements and

N

italic_N

is the batch size The ith sample in

B_q

italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT

and the jth sample in

B_g

italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT

belong to the same person if and only if

ij

italic_i italic_j

The entire batch with both

B_q

italic_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT

and

B_g

italic_B start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT

is fed into the DNN and consequently the two lists of biometric signatures are computed by the model

S_qDNNleftX_irightN_j0

italic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_D italic_N italic_N italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT

S_gDNNleftX_jrightN_j0

italic_S start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT italic_D italic_N italic_N italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j 0 end_POSTSUBSCRIPT

As a result a similarity matrix

simqg

italic_s italic_i italic_m italic_q italic_g

of size

Ntimes N

italic_N italic_N

is computed between the query and gallery signatures using cosine similarity Due to the

l2

italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

normalization in the Signature Module this is simplified to the dot product

simqgS_qtextperiodcentered ST_g

italic_s italic_i italic_m italic_q italic_g italic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT

12

In the similarity matrix shown in Figure 2 diagonal elements indicate similarities between each query signature and its corresponding positive gallery signature same person while offdiagonal elements correspond to negative pairs different people We apply crossentropy loss across each row to maximize diagonal positive scores and minimize offdiagonal negative ones For each query

S_qi

italic_S start_POSTSUBSCRIPT italic_q italic_i end_POSTSUBSCRIPT

the softmaxnormalized row is encouraged to peak at the ith position This leads the matrix toward an identity structure promoting separation between individuals and clustering of sameperson signatures

Refer to caption

Figure 2Similarity Matrix example used in inbatch negative loss function

4Experimental Results and Discussion

41Dataset

Experiments are conducted on the NTUFi dataset 16 14 This dataset is created for WiFi sensing applications and includes samples for both Human Activity Recognition HAR and Human Identification HID We utilize only the HID part to evaluate person ReID The dataset collects the CSI measurements of 14 different subjects For each subject 60 samples were collected while they were performing a short walk inside the designated test area The samples were collected in three different scenarios subjects wearing only a Tshirt a Tshirt and a coat and a Tshirt coat and backpack respectively The data we recorded using two TPLink N750 routers The transmitter router contains a single antenna while the receiver one contains three antennas CSI amplitude data were collected across 114 subcarriers per antenna pair and recorded over 2000 packets per sample As a result each sample has a dimensionality of

3times 114times 2000

3 114 2000

The publicly available dataset provides only the amplitude values already extracted from the CSI with no access to the original complex CSI matrices The dataset is predivided into training and test sets containing 546 and 294 samples respectively To allow for evaluation during training a 3fold crossvalidation strategy is employed using an 80 training and 20 validation split within each fold

Table 1Results of each model on the NTUFi test set

Model Rank1 Rank3 Rank5 mAP

LSTM

0777pm 0032

0777 0032

0897pm 0014

0897 0014

0933pm 0005

0933 0005

0568pm 0010

0568 0010

BiLSTM

0845pm 0045

0845 0045

0934pm 0022

0934 0022

0958pm 0013

0958 0013

0612pm 0026

0612 0026

Transformer

mathbf0955pm 0013

bold_0955 bold_0013

mathbf0981pm 0006

bold_0981 bold_0006

mathbf0991pm 0000

bold_0991 bold_0000

mathbf0884pm 0012

bold_0884 bold_0012

Table 2Performance comparison of different models with and without amplitude filtering Metrics reported are Rank1 accuracy and mean Average Precision mAP The results highlight the impact of amplitude filtering on retrieval performance across LSTM BiLSTM and Transformerbased models

Without filter With filter

Model Rank1 mAP Rank1 mAP

LSTM 0777

pm

0032 0568

pm

0010 0755

pm

0038 0587

pm

0018

BiLSTM 0845

pm

0045 0612

pm

0026 0786

pm

0036 0675

pm

0018

Transformers 0955

pm

0013 0884

pm

0012 0930

pm

0025 0851

pm

0035

Table 3Effect of varying packet sizes on model performance Results are reported for LSTM and Transformer architectures across different packet counts 100 to 2000 using Rank1 Rank3 Rank5 accuracy and mean Average Precision mAP as evaluation metrics The table illustrates how performance trends shift with input granularity for both model types

Model Packets Rank1 Rank3 Rank5 mAP

LSTM 100 0805

pm

0050 0918

pm

0029 0939

pm

0022 0597

pm

0002

LSTM 200 0777

pm

0032 0897

pm

0014 0933

pm

0005 0568

pm

0010

LSTM 500 0777

pm

0065 0906

pm

0028 0939

pm

0017 0592

pm

0040

LSTM 1000 0794

pm

0048 0991

pm

0019 0947

pm

0011 0592

pm

0046

LSTM 2000 0799

pm

0029 0915

pm

0019 0943

pm

0013 0579

pm

0028

Transformers 100 0952

pm

0021 0983

pm

0006 0990

pm

0005 0871

pm

0041

Transformers 200 0955

pm

0013 0981

pm

0006 0991

pm

0000 0884

pm

0012

Transformers 500 0937

pm

0020 0976

pm

0012 0984

pm

0011 0840

pm

0033

Transformers 1000 0960

pm

0013 0984

pm

0005 0988

pm

0001 0896

pm

0020

Transformers 2000 0960

pm

0014 0982

pm

0011 0990

pm

0008 0850

pm

0054

Table 4Impact of data augmentation on model performance Comparison of Rank1 accuracy and mean Average Precision mAP for LSTM BiLSTM and Transformer models evaluated with and without data augmentation

Without augmentation With augmentation

Model Rank1 mAP Rank1 mAP

LSTM 0777

pm

0032 0568

pm

0010 0808

pm

0038 0587

pm

0018

BiLSTM 0845

pm

0045 0612

pm

0026 0889

pm

0017 0668

pm

0016

Transformers 0955

pm

0013 0884

pm

0012 0949

pm

0014 0860

pm

0043

Table 5Evaluation of encoder type and layer depth on performance Rank1 Rank3 Rank5 accuracy and mean Average Precision mAP are reported for LSTM BiLSTM and Transformer models with 1 and 3 encoder layers

Model Layers Rank1 Rank3 Rank5 mAP

LSTM 1 0777

pm

0032 0897

pm

0014 0933

pm

0005 0568

pm

0010

LSTM 3 0822

pm

0026 0909

pm

0004 0941

pm

0004 0585

pm

0001

BiLSTM 1 0845

pm

0045 0934

pm

0022 0958

pm

0013 0612

pm

0026

BiLSTM 3 0825

pm

0042 0919

pm

0012 0955

pm

0003 0632

pm

0043

Transformers 1 0955

pm

0013 0981

pm

0006 0991

pm

0000 0884

pm

0012

Transformers 3 0919

pm

0028 0970

pm

0008 0984

pm

0003 0658

pm

0026

42Implementation Details

We train our model using an AMD Ryzen 7 processor with 8 cores 16 virtual cores 64GB RAM and a NVIDIA GeForce RTX 3090 GPU with 24GB of RAM For the models implementation the Pytorch framework has been used Regarding the training process 300 epochs are performed for each model using a batch size of 8 Adam 9 optimizer is used with a starting learning rate of 00001 A StepLR learning rate scheduler decreases the learning rate by a factor of 095 every 50 epochs

43Person ReIdentification Evaluation

To evaluate the performance our ReID model mean Average Precision mAP has been used together with Rankk accuracy defined as follows

textRankkfrac1Nsum_i1Ndeltar_ileq k

Rank italic_k divide start_ARG 1 end_ARG start_ARG italic_N end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_k

13

which provides the probability of finding the wanted subject in the top k most probable labels The results obtained during the tests are shown in Table 1 As demonstrated the model utilizing the Transformers encoder exceeds in performance both LSTM and BiLSTM ones The Transformerbased model achieves a 955 score for the Rank1 metric and an mAP score 884 The selfattention mechanism of Transformer renders it more accurate and robust at capturing the discriminative longrange temporal patterns within the WiFi amplitude sequences relevant for ReID compared to the LSTMbased models

44Ablation Study

Regarding amplitude filtering Table 2 shows that models trained without the amplitude filtering preprocessing step achieved better performance This suggests that the filtering process may have inadvertently removed useful signal variations essential for learning highly discriminative biometric signatures As for data augmentation Table 4 indicates that the applied transformations improved generalization for both LSTM and BiLSTM architectures In contrast the Transformer encoder did not benefit significantly although it consistently outperformed the other two models even without augmentation With respect to packet size Table 3 reveals that LSTM performance remained mostly stable or slightly degraded with longer sequence lengths likely due to vanishing gradient issues and limited context modeling Conversely the Transformer benefited from extended input sequences thanks to its selfattention mechanism that allows efficient modeling of longrange dependencies Only LSTM and Transformer models were evaluated in this experiment due to the increased computational cost associated with longer inputs Finally we compared shallow 1layer and deeper 3layer variants of each encoder in Table 5 The Transformer achieved its best performance with a single layer as deeper configurations led to overfitting and optimization instability For LSTM and BiLSTM models stacking layers resulted in marginal performance gains but introduced slower convergence and reduced training stability These findings reinforce the overall robustness and efficiency of the Transformer encoder within the proposed framework

5Conclusion

In this paper we presented a pipeline to address the problem of person ReID using WiFi CSI The proposed approach leverages a DNN that generates biometric signatures from CSIderived features These signatures are then compared to a gallery of known subjects to perform reidentification through similarity matching We evaluated three encoder architectures LSTM BiLSTM and Transformer on the publicly available NTUFi dataset with the Transformerbased model delivering the best overall performance By applying a unified and reproducible pipeline to a public benchmark this work establishes a valuable baseline for future research in CSIbased person reidentification The encouraging results achieved confirm the viability of WiFi signals as a robust and privacypreserving biometric modality and position this study as a meaningful step forward in the development of signalbased ReID systems

Acknowledgements

This work was supported by the Smart unmannEd AeRial vehiCles for Human l

Posted on: Dec 16, 2025
« Previous | Next »

← Back


© Copyright 2026 NoFluff Collection