Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sensors (Basel) ; 23(8)2023 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-37112359

RESUMO

During the COVID-19 pandemic, most organizations were forced to implement a work-from-home policy, and in many cases, employees have not been expected to return to the office on a full-time basis. This sudden shift in the work culture was accompanied by an increase in the number of information security-related threats which organizations were unprepared for. The ability to effectively address these threats relies on a comprehensive threat analysis and risk assessment and the creation of relevant asset and threat taxonomies for the new work-from-home culture. In response to this need, we built the required taxonomies and performed a thorough analysis of the threats associated with this new work culture. In this paper, we present our taxonomies and the results of our analysis. We also examine the impact of each threat, indicate when it is expected to occur, describe the various prevention methods available commercially or proposed in academic research, and present specific use cases.


Assuntos
COVID-19 , Pandemias , Humanos , Pandemias/prevenção & controle , Segurança Computacional , Medição de Risco
2.
Sensors (Basel) ; 22(11)2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35684879

RESUMO

Radar systems are mainly used for tracking aircraft, missiles, satellites, and watercraft. In many cases, information regarding the objects detected by a radar system is sent to, and used by, a peripheral consuming system, such as a missile system or a graphical user interface used by an operator. Those systems process the data stream and make real-time operational decisions based on the data received. Given this, the reliability and availability of information provided by radar systems have grown in importance. Although the field of cyber security has been continuously evolving, no prior research has focused on anomaly detection in radar systems. In this paper, we present an unsupervised deep-learning-based method for detecting anomalies in radar system data streams; we take into consideration the fact that a data stream created by a radar system is heterogeneous, i.e., it contains both numerical and categorical features with non-linear and complex relationships. We propose a novel technique that learns the correlation between numerical features and an embedding representation of categorical features in an unsupervised manner. The proposed technique, which allows for the detection of the malicious manipulation of critical fields in a data stream, is complemented by a timing-interval anomaly-detection mechanism proposed for the detection of message-dropping attempts. Real radar system data were used to evaluate the proposed method. Our experiments demonstrated the method's high detection accuracy on a variety of data-stream manipulation attacks (an average detection rate of 88% with a false -alarm rate of 1.59%) and message-dropping attacks (an average detection rate of 92% with a false-alarm rate of 2.2%).

3.
Artigo em Inglês | MEDLINE | ID: mdl-35544499

RESUMO

Utilizing existing methods for bias detection in machine learning (ML) models is challenging since each method: 1) explores a different ethical aspect of bias, which may result in contradictory output among the different methods; 2) provides output in a different range/scale and therefore cannot be compared with other methods; and 3) requires different input, thereby requiring a human expert's involvement to adjust each method according to the model examined. In this article, we present BENN, a novel bias estimation method that uses a pretrained unsupervised deep neural network. Given an ML model and data samples, BENN provides a bias estimation for every feature based on the examined model's predictions. We evaluated BENN using three benchmark datasets, one proprietary churn prediction model used by a European telecommunications company, and a synthetic dataset that includes both a biased feature and a fair one. BENN's results were compared with an ensemble of 21 existing bias estimation methods. The evaluation results show that BENN provides bias estimations that are aligned with those of the ensemble while offering significant advantages, including the fact that it is a generic approach (i.e., can be applied to any ML model) and does not require a domain expert.

4.
Sensors (Basel) ; 22(9)2022 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-35591269

RESUMO

Driving under the influence of alcohol is a widespread phenomenon in the US where it is considered a major cause of fatal accidents. In this research, we present Virtual Breathalyzer, a novel approach for detecting intoxication from the measurements obtained by the sensors of smartphones and wrist-worn devices. We formalize the problem of intoxication detection as the supervised machine learning task of binary classification (drunk or sober). In order to evaluate our approach, we conducted a field experiment and collected 60 free gait samples from 30 patrons of three bars using a Microsoft Band and Samsung Galaxy S4. We validated our results against an admissible breathalyzer used by the police. A system based on this concept successfully detected intoxication and achieved the following results: 0.97 AUC and 0.04 FPR, given a fixed TPR of 1.0. Our approach can be used to analyze the free gait of drinkers when they walk from the car to the bar and vice versa, using wearable devices which are ubiquitous and more widespread than admissible breathalyzers. This approach can be utilized to alert people, or even a connected car, and prevent people from driving under the influence of alcohol.


Assuntos
Intoxicação Alcoólica , Condução de Veículo , Dispositivos Eletrônicos Vestíveis , Intoxicação Alcoólica/diagnóstico , Testes Respiratórios , Etanol , Marcha , Humanos
5.
Sensors (Basel) ; 22(7)2022 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-35408222

RESUMO

A Global Positioning System (GPS) spoofing attack can be launched against any commercial GPS sensor in order to interfere with its navigation capabilities. These sensors are installed in a variety of devices and vehicles (e.g., cars, planes, cell phones, ships, UAVs, and more). In this study, we focus on micro UAVs (drones) for several reasons: (1) they are small and inexpensive, (2) they rely on a built-in camera, (3) they use GPS sensors, and (4) it is difficult to add external components to micro UAVs. We propose an innovative method, based on the video stream captured by a drone's camera, for the real-time detection of GPS spoofing attacks targeting drones. The proposed method collects frames from the video stream and their location (GPS coordinates); by calculating the correlation between each frame, our method can detect GPS spoofing attacks on drones. We first analyze the performance of the suggested method in a controlled environment by conducting experiments on a flight simulator that we developed. Then, we analyze its performance in the real world using a DJI drone. Our method can provide different levels of security against GPS spoofing attacks, depending on the detection interval required; for example, it can provide a high level of security to a drone flying at altitudes of 50-100 m over an urban area at an average speed of 4 km/h in conditions of low ambient light; in this scenario, the proposed method can provide a level of security that detects any GPS spoofing attack in which the spoofed location is a distance of 1-4 m (an average of 2.5 m) from the real location.


Assuntos
Sistemas de Informação Geográfica , Dispositivos Aéreos não Tripulados
6.
J Digit Imaging ; 35(3): 666-677, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35178644

RESUMO

Medical imaging devices (MIDs) are exposed to cyber-security threats. Currently, a comprehensive, efficient methodology dedicated to MID cyber-security risk assessment is lacking. We propose the Threat identification, ontology-based Likelihood, severity Decomposition, and Risk assessment (TLDR) methodology and demonstrate its feasibility and consistency with existing methodologies, while being more efficient, providing details regarding the severity components, and supporting organizational prioritization and customization. Using our methodology, the impact of 23 MIDs attacks (that were previously identified) was decomposed into six severity aspects. Four Radiology Medical Experts (RMEs) were asked to assess these six aspects for each attack. The TLDR methodology's external consistency was demonstrated by calculating paired T-tests between TLDR severity assessments and those of existing methodologies (and between the respective overall risk assessments, using attack likelihood estimates by four healthcare cyber-security experts); the differences were insignificant, implying externally consistent risk assessment. The TLDR methodology's internal consistency was evaluated by calculating the pairwise Spearman rank correlations between the severity assessments of different groups of two to four RMEs and each of their individual group members, showing that the correlations between the severity rankings, using the TLDR methodology, were significant (P < 0.05), demonstrating that the severity rankings were internally consistent for all groups of RMEs. Using existing methodologies, however, the internal correlations were insignificant for groups of less than four RMEs. Furthermore, compared to standard risk assessment techniques, the TLDR methodology is also sensitive to local radiologists' preferences, supports a greater level of flexibility regarding risk prioritization, and produces more transparent risk assessments.


Assuntos
Segurança Computacional , Confidencialidade , Humanos , Radiografia , Radiologistas , Medição de Risco
7.
Artif Intell Med ; 123: 102229, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34998518

RESUMO

Complex medical devices are controlled by instructions sent from a host personal computer (PC) to the device. Anomalous instructions can introduce many potentially harmful threats to patients (e.g., radiation overexposure), to physical device components (e.g., manipulation of device motors), or to functionality (e.g., manipulation of medical images). Threats can occur due to cyber-attacks, human error (e.g., using the wrong protocol, or misconfiguring the protocol's parameters by a technician), or host PC software bugs. Thus, anomalous instructions might represent an intentional threat to the patient or to the device, a human error, or simply a non-optimal operation of the device. To protect medical devices, we propose a new dual-layer architecture. The architecture analyzes the instructions sent from the host PC to the physical components of the device, to detect anomalous instructions using two detection layers: (1) an unsupervised context-free (CF) layer that detects anomalies based solely on the instruction's content and inter-correlations; and (2) a supervised context-sensitive (CS) layer that detects anomalies in both the clinical objective and patient contexts using a set of supervised classifiers pre-trained for each specific context. The proposed dual-layer architecture was evaluated in the computed tomography (CT) domain, using 4842 CT instructions that we recorded, including two types of CF anomalous instructions, four types of clinical objective context instructions and four types of patient context instructions. The CF layer was evaluated using 14 unsupervised anomaly detection algorithms. The CS layer was evaluated using six supervised classification algorithms applied to each context (i.e., clinical objective or patient). Adding the second CS supervised layer to the architecture improved the overall anomaly detection performance (by improving the detection of CS anomalous instructions [when they were not also CF anomalous]) from an F1 score baseline of 72.6%, to an improved F1 score of 79.1% to 99.5% (depending on the clinical objective or patient context used). Adding, the semantics-oriented CS layer enables the detection of CS anomalies using the semantics of the device's procedure, which is not possible when using just the purely syntactic CF layer. However, adding the CS layer also introduced a somewhat increased false positive rate (FPR), and thus reduced somewhat the specificity of the overall process. We conclude that by using both the CF and CS layers, a dual-layer architecture can better detect anomalous instructions to medical devices. The increased FPR might be reduced, in the future, through the use of stronger models, and by training them on more data. The improved accuracy, and the potential capability of adding explanations to both layers, might be useful for creating decision support systems for medical device technicians.


Assuntos
Algoritmos , Software , Humanos , Tomografia Computadorizada por Raios X
8.
IEEE Trans Image Process ; 31: 525-540, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34793299

RESUMO

When neural networks are employed for high-stakes decision-making, it is desirable that they provide explanations for their prediction in order for us to understand the features that have contributed to the decision. At the same time, it is important to flag potential outliers for in-depth verification by domain experts. In this work we propose to unify two differing aspects of explainability with outlier detection. We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction and at the same time identify regions of similarity between the predicted sample and the examples. The examples are real prototypical cases sampled from the training set via a novel iterative prototype replacement algorithm. Furthermore, we propose to use the prototype similarity scores for identifying outliers. We compare performance in terms of the classification, explanation quality and outlier detection of our proposed network with baselines. We show that our prototype-based networks extending beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.

9.
IEEE Trans Neural Netw Learn Syst ; 32(1): 128-138, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32167916

RESUMO

Despite their accuracy, neural network-based classifiers are still prone to manipulation through adversarial perturbations. These perturbations are designed to be misclassified by the neural network while being perceptually identical to some valid inputs. The vast majority of such attack methods rely on white-box conditions, where the attacker has full knowledge of the attacked network's parameters. This allows the attacker to calculate the network's loss gradient with respect to some valid inputs and use this gradient in order to create an adversarial example. The task of blocking white-box attacks has proved difficult to address. While many defense methods have been suggested, they have had limited success. In this article, we examine this difficulty and try to understand it. We systematically explore the capabilities and limitations of defensive distillation, one of the most promising defense mechanisms against adversarial perturbations suggested so far, in order to understand this defense challenge. We show that contrary to commonly held belief, the ability to bypass defensive distillation is not dependent on an attack's level of sophistication. In fact, simple approaches, such as the targeted gradient sign method, are capable of effectively bypassing defensive distillation. We prove that defensive distillation is highly effective against nontargeted attacks but is unsuitable for targeted attacks. This discovery led to our realization that targeted attacks leverage the same input gradient that allows a network to be trained. This implies that blocking them comes at the cost of losing the network's ability to learn, presenting an impossible tradeoff to the research community.


Assuntos
Redes Neurais de Computação , Algoritmos , Classificação/métodos , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes
11.
Sensors (Basel) ; 20(21)2020 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-33138009

RESUMO

Ultrasonic distance sensors use an ultrasonic pulse's time of flight to calculate the distance to the reflecting object. Widely used in industry, these sensors are an important component in autonomous vehicles, where they are used for such tasks as object avoidance and altitude measurement. The proper operation of such autonomous vehicles relies on sensor measurements; therefore, an adversary that has the ability to undermine the sensor's reliability can pose a major risk to the vehicle. Previous attempts to alter the measurements of this sensor using an external signal succeeded in performing a denial-of-service (DoS) attack, in which the sensor's reading showed a constant value, and a spoofing attack, in which the attacker could control the measurement to some extent. However, these attacks require precise knowledge of the sensor and its operation (e.g., timing of the ultrasonic pulse sent by the sensor). In this paper, we present an attack on ultrasonic distance sensors in which the measured distance can be altered (i.e., spoofing attack). The attack exploits a vulnerability discovered in the ultrasonic sensor's receiver that results in a fake pulse that is produced by a constant noise in the input. A major advantage of the proposed attack is that, unlike previous attacks, a constant signal is used, and therefore, no prior knowledge of the sensor's relative location or its timing behavior is required. We demonstrate the attack in both a lab setup (testbed) and a real setup involving a drone to demonstrate its feasibility. Our experiments show that the attack can interfere with the proper operation of the vehicle. In addition to the risk that the attack poses to autonomous vehicles, it can also be used as an effective defensive tool for restricting the movement of unauthorized autonomous vehicles within a protected area.

12.
Sensors (Basel) ; 20(17)2020 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-32858840

RESUMO

Over the last decade, video surveillance systems have become a part of the Internet of Things (IoT). These IP-based surveillance systems now protect industrial facilities, railways, gas stations, and even one's own home. Unfortunately, like other IoT systems, there are inherent security risks which can lead to significant violations of a user's privacy. In this review, we explore the attack surface of modern surveillance systems and enumerate the various ways they can be compromised with real examples. We also identify the threat agents, their attack goals, attack vectors, and the resulting consequences of successful attacks. Finally, we present current countermeasures and best practices and discuss the threat horizon. The purpose of this review is to provide researchers and engineers with a better understanding of a modern surveillance systems' security, to harden existing systems and develop improved security solutions.

13.
Neural Netw ; 124: 243-257, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32028053

RESUMO

This paper presents TrustSign, a novel, trusted automatic malware signature generation method based on high-level deep features transferred from a VGG-19 neural network model pretrained on the ImageNet dataset. While traditional automatic malware signature generation techniques rely on static or dynamic analysis of the malware's executable, our method overcomes the limitations associated with these techniques by producing signatures based on the presence of the malicious process in the volatile memory. By leveraging the cloud's virtualization technology, TrustSign analyzes the malicious process in a trusted manner, since the malware is unaware and cannot interfere with the inspection procedure. Additionally, by removing the dependency on the malware's executable, our method is fully capable of signing fileless malware as well. TrustSign's signature generation process does not require feature engineering or any additional model training, and it is done in a completely unsupervised manner, eliminating the need for a human expert. Because of this, our method has the advantage of dramatically reducing signature generation and distribution time. In fact, in this paper we rethink the typical use of deep convolutional neural networks and use the VGG-19 model as a topological feature extractor for a vastly different task from the one it was trained for. The results of our experimental evaluation demonstrate TrustSign's ability to generate signatures impervious to the process state over time. By using the signatures generated by TrustSign as input for various supervised classifiers, we achieved up to 99.5% classification accuracy.


Assuntos
Computação em Nuvem/normas , Segurança Computacional/normas , Aprendizado Profundo
14.
Data Brief ; 26: 104437, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31528674

RESUMO

This article presents a dataset for studying the detection of obfuscated malware in volatile computer memory. Several obfuscated reverse remote shells were generated using Metasploit-Framework, Hyperion, and PEScrambler tools. After compromising the host, Memory snapshots of a Windows 10 virtual machine were acquired using the open-source Rekall's WinPmem acquisition tool. The dataset is complemented by memory snapshots of uncompromised virtual machines. The data includes a reference for all running processes as well as a mapping for the designated malware running inside the memory. The datasets are available in the article, for advancing research towards the detection of obfuscated malware from volatile computer memory during a forensic analysis.

15.
Data Brief ; 23: 103863, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31372474

RESUMO

The datasets in this article are produced to evaluate the ability of MIL-STD-1553 intrusion detection systems to detect attacks that emulate normal non-periodical messages, at differing attack occurrence rates. And different data representations. We present three streams of simulated MIL-STD-1553 traffic containing both normal and attack messages corresponding to packets that were injected into the bus by a malicious remote terminal. The implemented attacks emulate normal non-periodical communication so detecting them with a low false positive rate is non-trivial. Each stream is separated into a training set of normal messages and a test set of both normal and attack messages. The test sets differ by the occurrence rate of attack messages (0.01%, 0.10%, and 1.00%). Each stream is also preprocessed into a dataset of message sequences so that it can be used for sequential anomaly detection analysis. The sequential test sets differ by the occurrence rate of attack sequences (0.14%, 1.26%, and 11.01%). All dataset files can be found in Mendeley Data, doi:10.17632/jvgdrmjvs3.3.

16.
Commun Biol ; 2: 214, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31240252

RESUMO

The identification and understanding of metabolic pathways is a key aspect in crop improvement and drug design. The common approach for their detection is based on gene annotation and ontology. Correlation-based network analysis, where metabolites are arranged into network formation, is used as a complentary tool. Here, we demonstrate the detection of metabolic pathways based on correlation-based network analysis combined with machine-learning techniques. Metabolites of known tomato pathways, non-tomato pathways, and random sets of metabolites were mapped as subgraphs onto metabolite correlation networks of the tomato pericarp. Network features were computed for each subgraph, generating a machine-learning model. The model predicted the presence of the ß-alanine-degradation-I, tryptophan-degradation-VII-via-indole-3-pyruvate (yet unknown to plants), the ß-alanine-biosynthesis-III, and the melibiose-degradation pathway, although melibiose was not part of the networks. In vivo assays validated the presence of the melibiose-degradation pathway. For the remaining pathways only some of the genes encoding regulatory enzymes were detected.


Assuntos
Aprendizado de Máquina , Metabolômica/métodos , Solanum lycopersicum/metabolismo , Redes e Vias Metabólicas
17.
Artif Intell Med ; 81: 12-32, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28456512

RESUMO

BACKGROUND AND OBJECTIVES: Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers. METHODS: We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label. RESULTS: The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods. CONCLUSIONS: The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.


Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Aprendizado de Máquina Supervisionado , Área Sob a Curva , Humanos , Curva de Aprendizado , Variações Dependentes do Observador , Fenótipo , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Fatores de Tempo
18.
J Biomed Inform ; 61: 44-54, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27016383

RESUMO

Classification of condition severity can be useful for discriminating among sets of conditions or phenotypes, for example when prioritizing patient care or for other healthcare purposes. Electronic Health Records (EHRs) represent a rich source of labeled information that can be harnessed for severity classification. The labeling of EHRs is expensive and in many cases requires employing professionals with high level of expertise. In this study, we demonstrate the use of Active Learning (AL) techniques to decrease expert labeling efforts. We employ three AL methods and demonstrate their ability to reduce labeling efforts while effectively discriminating condition severity. We incorporate three AL methods into a new framework based on the original CAESAR (Classification Approach for Extracting Severity Automatically from Electronic Health Records) framework to create the Active Learning Enhancement framework (CAESAR-ALE). We applied CAESAR-ALE to a dataset containing 516 conditions of varying severity levels that were manually labeled by seven experts. Our dataset, called the "CAESAR dataset," was created from the medical records of 1.9 million patients treated at Columbia University Medical Center (CUMC). All three AL methods decreased labelers' efforts compared to the learning methods applied by the original CAESER framework in which the classifier was trained on the entire set of conditions; depending on the AL strategy used in the current study, the reduction ranged from 48% to 64% that can result in significant savings, both in time and money. As for the PPV (precision) measure, CAESAR-ALE achieved more than 13% absolute improvement in the predictive capabilities of the framework when classifying conditions as severe. These results demonstrate the potential of AL methods to decrease the labeling efforts of medical experts, while increasing accuracy given the same (or even a smaller) number of acquired conditions. We also demonstrated that the methods included in the CAESAR-ALE framework (Exploitation and Combination_XA) are more robust to the use of human labelers with different levels of professional expertise.


Assuntos
Curadoria de Dados , Registros Eletrônicos de Saúde , Aprendizagem Baseada em Problemas , Algoritmos , Automação , Humanos
19.
Sci Eng Ethics ; 20(4): 1027-43, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24218141

RESUMO

Online social networks (OSNs) have rapidly become a prominent and widely used service, offering a wealth of personal and sensitive information with significant security and privacy implications. Hence, OSNs are also an important--and popular--subject for research. To perform research based on real-life evidence, however, researchers may need to access OSN data, such as texts and files uploaded by users and connections among users. This raises significant ethical problems. Currently, there are no clear ethical guidelines, and researchers may end up (unintentionally) performing ethically questionable research, sometimes even when more ethical research alternatives exist. For example, several studies have employed "fake identities" to collect data from OSNs, but fake identities may be used for attacks and are considered a security issue. Is it legitimate to use fake identities for studying OSNs or for collecting OSN data for research? We present a taxonomy of the ethical challenges facing researchers of OSNs and compare different approaches. We demonstrate how ethical considerations have been taken into account in previous studies that used fake identities. In addition, several possible approaches are offered to reduce or avoid ethical misconducts. We hope this work will stimulate the development and use of ethical practices and methods in the research of online social networks.


Assuntos
Confidencialidade/ética , Enganação , Identificação Psicológica , Princípios Morais , Privacidade , Mídias Sociais , Ciências Sociais/ética , Ética em Pesquisa , Humanos , Registros , Projetos de Pesquisa , Apoio Social
20.
Phys Rev E Stat Nonlin Soft Matter Phys ; 76(5 Pt 2): 056709, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18233792

RESUMO

In this paper, we propose a method for rapid computation of group betweenness centrality whose running time (after preprocessing) does not depend on network size. The calculation of group betweenness centrality is computationally demanding and, therefore, it is not suitable for applications that compute the centrality of many groups in order to identify new properties. Our method is based on the concept of path betweenness centrality defined in this paper. We demonstrate how the method can be used to find the most prominent group. Then, we apply the method for epidemic control in communication networks. We also show how the method can be used to evaluate distributions of group betweenness centrality and its correlation with group degree. The method may assist in finding further properties of complex networks and may open a wide range of research opportunities.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...