REVIEW ON DEEP LEARNING BASED TECHNIQUES FOR PERSON RE-IDENTIFICATION

In-depth study has recently been concentrated on human re-identification, which is a crucial component of automated video surveillance. Re-identification is the act of identifying someone in photos or videos acquired from other cameras after they have already been recognized in an image or video from one camera. Re-identification, which involves generating consistent labelling between several cameras, or even just one camera, is required to reconnect missing or interrupted tracks. In addition to surveillance, it may be used in forensics, multimedia, and robotics.Re-identification of the person is a difficult problem since their look fluctuates across many cameras with visual ambiguity and spatiotemporal uncertainty. These issues can be largely caused by inadequate video feeds or low-resolution photos that are full of unnecessary facts and prevent re-identification. The geographical or temporal restrictions of the challenge are difficult to capture. The computer vision research community has given the problem a lot of attention because of how widely used and valuable it is. In this article, we look at the issue of human re-identification and discuss some viable approaches.


INTRODUCTION
T The process of Person Re-identification(Re-ID) has been thoroughly studied as a distinct person retrieval problem among non-overlapping cameras [1].Re-goal ID's is to determine whether a person of interest has ever been at a location at a different time that was captured by the same camera at a different time instant or even the same camera placed somewhere else [2].A photograph [3], a video clip [4], or even a written explanation [5] might be used to illustrate the subject.Person Re-ID is essential in smart surveillance technology with substantial academic effect and practical advantage due to the pressing need for community security and the growing number of security cameras.
The procedure of re-ID is difficult because of a variety of camera motions [6], poor picture resolutions [7,8,9,10,11, heterogeneous modalities [11,12], complicated camera surroundings, background clutter [12], inaccurate bounding box creation, etc. These provide a lot of variations and uncertainty. Other elements that significantly increase the difficulties for realistic model deployment include the dynamically network of upgraded cameras [13], a massive gallery that offers effective restoration [14], group ambiguity [15], important domain change [16], unknown examining situations [17], and updating a model progressively [18], and changing clothes [19]. Re-ID still presents a problem as a result of these issues. This encourages us to carry out an extensive survey, establish a solid baseline for various Re-ID efforts, and discuss a wide range of potential future paths. Person Although Re-ID is a difficult process, enhancing the semantic integrity of the analysis depends on it. Re-ID is crucial for programs that make use of single-camera surveillance systems. For instance, to find out if a person regularly visits the same place or if a different person or the same one picks up an abandoned box or bag. In addition to tracking, it has uses in robotics, multimedia, and more well-known technologies like automatic photo labeling and photo surfing [20].It is not difficult to comprehend the Person's Re-ID process. Being human, we always do it with ease. Our sights and minds have been conditioned to locate, identify, and then re-identify things and people in the actual world. Re-ID, which can be shown in Fig. 1, is the idea that a person who has been earlier seen would be identified as soon as they make an appearance using a specific description of the individual.
Even if hand-crafted features had some early success [21] and metric learning [22], the most advanced Re-ID algorithms currently available are constructed using convolutional neural networks (CNNs), which, when trained under supervision, need a significant amount of annotated (labelled) data to learn a stable embedding subspace. Recent deep learning approaches and detailed investigations on person Re-ID utilizing custom systemsprovided in [23], respectively. Large-scale dataset annotation for Re-ID is exceedingly labor intensive, time-consuming, and expensive, especially for techniques needing numerous bounding boxes for each individual to increase accuracy by making generalisations between two separate activities. One-shot learning and unsupervised learning are combined, for instance, in [24] and, which employ the Resnet50 [25] architecture with pre-trained networks on ImageNet [26]. Although it has been empirically demonstrated that pre-training and transfer learning significantly boost neural network performance, they are not appropriate for adjusting parameters across a wide range of domains or topologies. This article highlights the obstacles and unresolved problems in human re-identification and datasets, deep learning algorithms, and current research in these areas.

DEEP LEARNING MODERN RESEARCH
In today's Era, intelligent systems and tech sophisticated automation are the main focuses across a diverse range of domains, including smart cities, e-Health, enterprise intelligence, innovative treatment, cyber security intellectual ability, and many more [27].Particularly when it comes to security technologies as a wonderful approach to disclose complicated data structures in high dimensions, deep learning techniques have substantially improved in terms of effectiveness across a wide range of applications. In order to create intelligent data-driven systems that satisfy current expectations, DL techniques might be extremely important because to their exceptional learning capabilities from past data. DL has the ability to change both the world and how people live since it can automate procedures and learn from mistakes.

DEEP LEARNING TECHNIQUES
This section discusses the various deep neural network techniques. These strategies frequently use hierarchical structures with numerous levels of information processing to learn. Among the numerous hidden layers that are frequently observed in deep neural networks are the input and output layers. Reviewing the different training exercises available, that is (i) Supervised, an approach that is challenging and utilizes the use of tagged training data, and (ii) Unsupervised, an approach that examines unlabeled sets of data, is important before diving into the details of DL approaches.

SUPERVISED OR DISCRIMINATIVE LEARNING NETWORK
The term "supervised learning" refers to a process in which a supervisor doubles as an educator. The technique of instructing or training a computer system using labelled data is known as supervised learning. This suggests that the appropriate response has already been given to the given data. The machine is then given a new collection of examples so that the supervised learning algorithm may examine the training data (set of training examples) and provide an accurate output from labelled data.Discriminative deep architectures are frequently developed to give discriminative capability for pattern classification by modelling the posterior distributions of classes conditioned on observable data [29]. The three main categories of discriminative architectures are Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN or ConvNet), Recurrent Neural Networks (RNN), and their variations. Here, we'll briefly discuss these techniques.

MULTI-LAYER PERCEPTRON (MLP)
The feed-forward artificial neural network known as the Multi-layer perceptron (MLP) [ The most popular technique for training MLP is back-propagation [31], a supervised learning method that is frequently referred to as the fundamental component of a neural network. Throughout the training phase, a variety of optimization techniques are employed liked Stochastic Gradient Descent (SGD), Limited Memory BFGS (L-BFGS), and Adaptive Moment Estimation (Adam).

CONVOLUTIONAL NEURAL NETWORK (CNN)
The commonly used deep learning architecture known as a convolutional neural network [33] was modelled after the visual brain of animals [34]. As seen in fig 2, Originally it had been used extensively for tasks involving object recognition, but it is currently also being investigated in areas such as object tracking [35], pose estimation [36], text detection and recognition [37], visual saliency detection [38], action recognition [39], scene labelling , and many more [40].
An illustration of Convolutional Neural Network.

RECURRENT NEURAL NETWORK (RNN)
Using sequential or time-series data, a different well-known neural network provides the result of one stage as input to the following step. The name for this neural network is recurrent neural network (RNN) [47]. Recurrent neural networks, like CNN and feed forward, learn from training input, but they stand out due to their "memory," which enables them to affect current input and output by consulting information from earlier inputs.While an RNN's output is reliant on what came before it in the sequence, a typical DNN assumes that inputs and outputs are independent of one another.However, because to the issue of declining gradients, standard networks with recurrence have difficulty in learning long data sequences. The popular recurrent network versions thatthe problems and perform effectively across a variety of real-life application areas are explored next.

LONG SHORT-TERM MEMORY (LSTM)
LSTMs are frequently used in video-based individual task re-ID and are capable of extracting temporal characteristics. Network for recurrent feature aggregation based on LSTM efficiently reduced interference brought on by background noise, shadowing, and recognition failure [48]. Among the first and shallowest LSTM nodes, it gathered cumulative discriminative characteristics. The temporal and geographical characteristics of the sections that include the probe pictures put were learned by the breakdown of a video sequence into multiple pieces [49]. The number of identical pedestrians in the sample is decreased by using this strategy, which also makes it simpler to identify similarity traits. Both of the aforementioned methods process each video frame independently.The duration of the video sequence typically has an impact on the characteristics that LSTM extracts. The RNN cannotcatch the temporalsignals of small details in the picture because it only creates temporal connections on highlevel characteristics [50]. Therefore, research into a more effective technique for extracting spatialtemporal characteristics is still necessary.

GATED RECURRENT UNITS (GRUS)
A popular gating-based variation of the recurrent network techniques to monitor and regulate the flow of information between neural network units is the Gated Recurrent Unit (GRU) [51]. A reset gate and an update gate are all that the GRU has, as seen in Fig. 3, making it less complex than an LSTM.The primary difference between the two devices is the number of gates: an LSTM has three gates compared to a GRU's two (the reset and update gates). The GRU's characteristics allow dependencies from long data sequences to be collected adaptively without removing information from previous portions of the sequence.GRU is a little more compact approach as a consequence, often providing comparable results, and is significantly faster to compute [52].

UNSUPERVISED LEARNING NETWORK
DL approaches are widely used to explain the combined statistical parameters of the available data and the classes that they belong to, as well as the higher predictive properties or features for pattern recognition or synthesis [53].Since the methods under this category include frequently used to learn features or data generation and representation, they are fundamentally utilized for unsupervised learning [54]. Since generative modeling maintains the correctness of the discriminative model, it may be used as a preliminary step for supervised learning tasks as well. For generative learning or unsupervised learning, deep neural network algorithms including the Generative Adversarial Network (GAN), Autoencoder (AE), Restricted Boltzmann Machine (RBM), Self-Organizing Map (SOM), and Deep Belief Network (DBN), as well as its variations, are often utilized.

GENERATIVE ADVERSARIAL NETWORK (GAN)
․GANs make use of neural networks' capacity to train a function that can simulate a distribution as closely as feasible to the real thing. They are particularly capable of producing synthetic pictures with great visual fidelity and do not rely on prior assumptions about the distribution of the data. This important characteristic enables the application of GANs to any imbalance issue in computer vision tasks. GANs give a technique to alter the original picture in addition to being able to create a false image. There are several GANs with different strengths that have been published in the literature to address the imbalance issue in computer vision tasks. For example, a specific type of GANs called AttGAN [55], IcGAN [56], ResAttrGAN [57], etc., is frequently employed for tasks involving modifying face attributes.
GANs are comprised of two neural networks, as shown in Fig. 4. The discriminator D predicts the chance that a following sample will be taken from real data as opposed to data given by the generator G, which generates new data with features similar to the original data.The generator and discriminator in GAN modeling are then instructed to interact with one another.Healthcare, computer vision, data augmentation, video production, voice synthesis, epidemic prevention, traffic control, network security, and many more fields might all benefit from the utilization of GAN networks. In general, GANs have demonstrated to be a solid field of independent data expansion and a solution to problems requiring generative techniques.

AUTO-ENCODER (AE)
A well-known auto-encoder (AE) unsupervised learning approach that makes use of neural networks to learn representations [59]. Data reduction describes the depiction of a set of data.High-dimensional data are often processed using auto-encoders. Three parts make up an autoencoder: an encoder, a code, and a decoder. The encoder creates the code that the decoder uses to reproduce the input by compressing it.Furthermore, generative data models have been learned using the AEs [60].Numerous unsupervised learning techniques, including dimension reduction, extraction of features, useful coding, dynamic modeling, noise removal, outlier or predictive modeling, etc., primarily rely on the auto-encoder. [59,61]

KOHONEN MAP OR SELF-ORGANIZING MAP (SOM)
Another unsupervised learning method for constructing a low-dimensional (usually two-dimensional) representation of a higher-dimensional data set while preserving the topological structure of the data is the Self-Organizing Map (SOM) or Kohonen Map [62].SOM is a neural network-based method for dimension reduction in clustering [63].We can display huge datasets and identify likely clusters by using a SOM, which constantly moves a dataset's topological layout by bringing its neurons close to the data points inside the dataset.The input layer is the initial layer of a SOM, followed by the output layer, also known as the feature map, is the second layer. SOMs use competitive learning, which makes use of a neighboring function to preserve the topological properties of the input space, in contrast to other neural network models that use error-correction learning, such as Backpropagation with gradient descent [64].Sequence identification, sickness or health diagnosis, fault diagnosis, and virus or parasite attack detection are just a few of the many activities that SOM is frequently employed for [65].The main advantage of using a SOM is that it facilitates the discovery and recognition of patterns in highdimensional data.

BOLTZMANN MACHINE WITH RESTRICTIONS (RBM)
A generative statistical neural network with the ability to learn a likelihood function over its inputs is the Restricted Boltzmann Machine (RBM) [66].Each node in a Boltzmann machine can be either visible or hidden and is linked to every other node.By understanding how the system functions normally, we can better comprehend anomalies. In RBMs, a subset of Boltzmann machines, there is a limit on the number of linkages between the accessible and deep layers [67].Due to this constraint, training techniques for Boltzmann machines in general can be more effective than those for Boltzmann machines, such as the gradient-based contrastive divergence algorithm [68]. Among the various uses of RBMs are data reduction, categorization, prediction, content-based-based filtering, pattern recognition, subject modelling, and many others.

DEEP BELIEF NETWORK (DBN)
A Deep Belief Network (DBN) [69] is a multiple-layer adaptive visuals model composed of so many unsupervised networks, such as AEs or RBMs, layered one on top of the other and using the hidden layer of each model as the input for the layer below it or connected sequentially.As a result, there are two types of DBNs: AE-DBNs, also called as stacked AE, and RBMDBNs, also known as stacked RBMs.The AE-DBN consists of autoencoders, whereas Boltzmann machines with constraints constitute RBM-DBN, as was already indicated. The final objective is to create a descriptive divergence-based quicker unsupervised training method for every sub-network [70].The deep structure of DBN allows it to store a hierarchical representation of incoming data.Network architectures for unsupervised feedforward are trained using unlabeled data as the basic tenet of DBN, and the networks are subsequently fine-tuned using marked input. DBN's potentially significant benefits over traditional shallow learning networks is its capacity to identify specific patterns, with strengthens logic and the ability to distinguish between true and wrong data [71].
Thus, the strategies for generative learning that were previously explored frequently allow us to use research method to build a new set of the data. Deep generative models of these kinds with supervised or discriminative learning methods may benefit from this preparation and to guarantee model correctness because improving classifier generalization through unsupervised representation learning.

CHALLENGES AND OPEN ISSUES
The fundamental difficulty with Re-ID is the variance in a person's appearance across multiple cameras. Re-ID is challenging to make self-operating for a variety of reasons.Re-ID networks usually consist of two essential parts:the capture of a unique individual description and the process of comparing two models to see if they match or don't match.The capacity to automatically recognize and track individuals in photos or videos is necessary in order to develop a distinctive person description.Numerous difficulties and problems are apparent, and they will guide future research in the area of person Re-ID.

RE-ID DEPENDING ON DEPTH
Depth photos capture the bones and contours of the body.Re-ID is made possible by this, which is crucial for applications involving individualized human contact in lighting and clothing variations [88]. In [72], a paradigm based on recurrent attention is put out to solve individual identification based on depth. Convolutional and recurrent neural networks are used to locate tiny, exclusionary localized parts of the body in a reinforcement learning framework.

RE-ID USING VISIBLE-INFRARED TECHNOLOGY
Visible-Infrared Re-ID handles the cross-modality matching between the noticeable and thermal pictures [88]. Because only infrared cameras can take photographs in low-light conditions, it is essential [73].Along with the cross-modality shared embedding learning [74] also looks into the classifier level discrepancy. Recent methods [75] decrease cross-modality disparity at both the picture and feature level by creating cross-modality person photographs and applying the GAN approach. [76] models the crossmodal reconstruction using a hierarchy elements. [77] presents a dual-attentive aggregate learning strategy to identify multi-level links.

CROSS-RESOLUTION RE-ID
Taking into consideration the major resolution variations [78], Cross-Resolution Re-ID [88] compares images with different resolutions.The high-resolution human pictures are produced in a cascaded fashion using a cascaded SR-GAN [79], which also incorporates the identification data. The adversarial learning method is used by Li et al. [80] to create representations of pictures that are independent of resolution.

LABEL NOISE FOR RE-ID
It is typically hard to eliminate label noise when there is an annotation issue [88]. To prevent label overfitting problems, Zheng et al. use a label smoothing algorithm [81]. To effectively learn a Re-ID model while avoiding label noise and the consequences of data with high characteristic ambiguity should be mitigated, a Distribution Net (DNet) is described in [82] that encodes the feature uncertainty. For each identity, there aren't enough data for powerful Re-ID model training, unlike the generic classification issue [83]. It is also more challenging to learn the potent Re-ID model because of the unidentified new identities.

MULTI-CAMERA DYNAMIC NETWORK
The constantly updated multi-camera network [84], which necessitates model change for new cameras or probes, is another challenging issue. The Re-ID model may be updated and the representation can be tailored for different probing galleries employing an adaptive learning method with humans in the loop [85]. Active learning was a component of early research on continuous Re-ID in multi-camera networks [86]. [87] Introduces an approach for flexibility to adapt relying on the selective use of limited, non-redundant samples. Utilizing the ideal source camera theory and a geodesic flow kernel serve as the foundation for the development of a transitive inference strategy. An open-world person Re-ID system adds a number of contextual limitations (such Camera Topology) while dealing with large crowds and social interactions [88].

FEATURE LEARNING
High-level semantic representations of a person's traits, such their hair, gender, and age, can resist several environmental changes. For deep learning-based person Re-ID systems like those in [89], some research has used these traits to fill the gap between the photos and high-level conceptual data.Since it delivered on its predictions, feature recognition is one of the next possibilities.

ARCHITECTURE FOR AUTOMATED RE-ID
An advanced learning model's architectures must be manually created, which takes time, effort, and is prone to mistakes. The approach of automating architectural engineering, known as neural architecture search (NAS) [90], has recently been applied to address this issue. The study of NAS is currently receiving greater attention. Therefore, one of the essential aspects that must be considered in future research is the use of NAS for person Re-ID activities, as the majority of NAS techniques don't assure that there commended CNN is suitable for person Re-ID tasks.

ACCURACY VERSUS EFFICIENCY
Large models are typically employed to obtain the greatest accuracy, but they can be time and memory intensive, which reduces their usefulness, particularly when used to mimic real-time video monitoring systems. The majority of modern models did not consider CPU speed and memory capacity into account in order to increase accuracy. Authors who work in these fields must strike acompromise between processing speed and ranking accuracy.

LIGHTWEIGHT MODEL
The creation of a lightweight Re-ID model is another approach for dealing with the scalability problem. The issue of changing the network topology to create a light model is investigated [91,92]. Another strategy is to employ model distillation. A system for multi-teacher customizable comparison reduction, for instance, is provided in [93], which in the absence of a primary data source, trains a user-specified lightweight student model from a number of teacher models.

DATASETS AND EVALUATION
Individuals' appearances vary greatly depending on Lighting, stances, view angles, scales, and camera resolutions may all vary while using different cameras. Visual ambiguities are further increased by elements like occlusions, a crowded background, and articulated figures. Therefore, it is crucial to gather data that successfully captures these aspects in order to create viable Re-ID approaches. In addition to good quality data that replicates actual circumstances, it is essential to compare and assess the Re-ID methodologies that are developed and find ways to enhance the methodology and databases.

DATASETS BASED ON IMAGES
Person re-ID using images, there have been a variety of datasets; the most common datasets are listed below.
VIPeR [94]: It is made up of 1,264 photos for 632 people and was taken by 2 non-overlapping cameras.
CUHK01 [95]: It is made up of 3,884 photographs for 971 individuals that were recorded by two separate cameras on a university campus.
Market-1501 [96]: It is comprised of 32,643 photos for each of the 1,501 people that make up the sample, which was taken from the front of a shop using 2 to 6 separate cameras.

DATASETS BASED ON VIDEOS
PRID2011 [102]: It is constituted of 24541 photos for 934 individuals from 600 recordings taken from two separate cameras in an airport's multi-camera network.
iLIDSVID [103]: It is made up of 600 movies shot by 2 non-overlapping airport cameras and 42495 photos for 300 people.
MARS [104]: The greatest video-based individual Re-ID dataset resides in this one. It is made up of around 1191003 photos for 1261 individuals from 200 recordings that were captured by 2 to 6 nonoverlapping cameras.
RPIfield [105]: It consists of 601,581 images for 112 individuals, captured by 2 separate cameras on an open field at a college.

EVALUATION METRICS
Cumulative matching characteristics (CMC) [106] and mean average precision (mAP) are the two mostly used metrics for assessing Re-ID systems [107].

CONCLUSION
We have discussed the subject of human re-identification, as well as difficult problems and a summary of recent research in the discipline of person recognition, in this paper. Both closed set and open set Re-ID tasks have been taken into consideration. The approaches employed have been grouped, and their advantages and disadvantages have been covered. Additionally, we have outlined the benefits and drawbacks of the various Re-ID datasets. Popular Re-ID assessment methods are briefly discussed, along with potential expansions.In conclusion, person Re-ID is a broad and difficult field with much of space for growth and research.An effort is made in this work to give a concise overview of the Re-ID problem, its limitations, and related problems.