Visually Attacking and Shielding NLP Systems for Human-like Text Processing

text processing like humans do visually atta king n.w

1 / 61

Embed Share

Explore the innovative techniques of visually attacking and shielding NLP systems to enhance their robustness. Join the discussion on making NLP more human-like and resilient against spamming and toxic comments in the open domain. Dive into the experiments and results from the VIPER project conducted by the UKP Lab at Technische Universität Darmstadt, shedding light on the future of text processing.

cgle Follow

Uploaded on May 29, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Text Processing Like Humans Do: Visually Atta king n Shi lding NLP Systems teff n Eger, G zde G l ahin, Andr as R c l , Ji-Ung ee, la dia Schulz, Mohs n Mesgar, Krish ant Swar kar, E win Simpson, Ir na Gurevych Ubiquitous Knowledge Processing Lab AIPHES Research Training Group Technische Universit t Darmstadt https://www.ukp.tu-darmstadt.de/ https://www.aiphes.tu-darmstadt.de

Motivation Human vs Machine Text Processing Visual Perturber (VIPER) Making NLP Systems more robust Experiments & Results Recap & Conclusion 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 2

Motivaton 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 3

NLP out in the open 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 4

NLP out in the open Spamming Domain name spoofing Toxic Comments 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 5

NLP out in the open Spamming Domain name spoofing You idiot! You have no idea! Go to hell! Toxic Comments 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 6

NLP out in the open Spamming Domain name spoofing You idiot! You have no idea! Go to hell! Toxic Comments 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 7

NLP out in the open Spamming Domain name spoofing http://w kipedia.org You id ! You have no ea! Go to e ! Toxic Comments 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 8

NLP out in the open Spamming Domain name spoofing http://w kipedia.org You id ! You have no ea! Go to e ! Toxic Comments 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 9

Huma vs Mchine ext Procesing 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 10

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 11

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! You id ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 12

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! You id ! i d i o t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 13

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! You id ! i d i o t toxic toxic 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 14

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! i d i 105 100 111 248 427 105 100 111 248 427 You id ! i d i o t toxic toxic 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 15

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! i d i 105 100 111 248 427 105 100 111 248 427 You id ! 0.1 . . 0.2 0.6 . . 0.1 . . . i d i o t toxic toxic 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 16

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! i d i 105 100 111 248 427 105 100 111 248 427 You id ! 0.1 . . 0.2 0.6 . . 0.1 . . . i d i o t toxic toxic ? ? ? ? ? ? 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 17

Visually Attacking NLP Systems Human Human Machine Machine You id ! You id ! i d i 105 100 111 248 427 105 100 111 248 427 You id ! 0.1 . . 0.2 0.6 . . 0.1 . . . i d i o t toxic toxic ? ? ? ? ? ? 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 18

VIsal PERturbe (VIPER) 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 19

Creating Perturbations with VIPER Character embeddings a s Use pixel values as initialization P Cosine similarity to obtain similar characters 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 20

Attacking NLP Systems Use VIPER to perturb characters Test data Test data f u c k i n g p = 0.9 p = 0.9 u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 21

Attacking NLP Systems Use VIPER to perturb characters p : probability of perturbing a character Test data Test data f u c k i n g p = 0.9 p = 0.9 u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 22

Attacking NLP Systems Use VIPER to perturb characters p : probability of perturbing a character How do state-of-the-art models perform on perturbed test data? Test data Test data ? ? ? ? ? ? f u c k i n g p = 0.9 p = 0.9 vs vs ? ? ? ? ? ? u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 23

Human vs Machine Performance 0% Perturbance: game over man ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 24

Human vs Machine Performance 20% Perturbance: game ove m ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 25

Human vs Machine Performance 40% Perturbance: am a ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 26

Human vs Machine Performance 80% Perturbance: m o r a ! 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 27

Human vs Machine Performance 80% Perturbance: m o r a ! P = 0.8: Human performance ~94% Machine performance ~20-70% 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 28

Making NL Systems more robs 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 29

How to shield against such attacks? Visually Informed Character Embeddings Data Augmentation 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 30

How to shield against such attacks? Visually Informed Character Embeddings Data Augmentation 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 31

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 32

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed 0.1 . . 0.2 0.7 . . 0.2 . . . random init random init T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 33

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed non non- -toxic toxic 0.1 . . 0.2 0.7 . . 0.2 . . . random init random init T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 34

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed non non- -toxic toxic 0.1 . . 0.2 0.7 . . 0.2 . . . random init random init pixel images pixel images T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 35

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed non non- -toxic toxic 0.5 . . 0.3 0.8 . . 0.1 0.1 . . 0.2 0.7 . . 0.2 . . . . . . random init random init pixel images pixel images T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 36

Visually Informed Character Embeddings Visually uninformed Visually uninformed Visually informed Visually informed non non- -toxic toxic non non- -toxic toxic 0.5 . . 0.3 0.8 . . 0.1 0.1 . . 0.2 0.7 . . 0.2 . . . . . . random init random init pixel images pixel images T h e c a t T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 37

Visually Informed Character Embeddings No random initialization of character embeddings non non- -toxic toxic Use concatenated pixel values 0.5 . . 0.3 0.8 . . 0.1 . . . pixel images pixel images T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 38

Visually Informed Character Embeddings No random initialization of character embeddings non non- -toxic toxic Use concatenated pixel values Visual similarities can now be learned during training 0.5 . . 0.3 0.8 . . 0.1 . . . More likely to generalize better to new characters during testing pixel images pixel images T h e c a t 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 39

How to shield against such attacks? Visually Informed Character Embeddings Data Augmentation 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 40

Adversarial Training Training Training data data f u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 41

Adversarial Training Enrich training data with perturbations [Goodfellow et al, 2015] p = 0.2 p = 0.2 . . . Training Training data data f u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 42

Adversarial Training Enrich training data with perturbations [Goodfellow et al, 2015] 0.5 . . 0.3 0.8 . . 0.1 . . . . . . u c k i n g p = p = 0.2 0.2 . . . Training Training data data f u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 43

Adversarial Training Enrich training data with perturbations [Goodfellow et al, 2015] toxic toxic 0.5 . . 0.3 0.8 . . 0.1 . . . . . . u c k i n g p = 0.2 p = 0.2 . . . Training Training data data f u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 44

Adversarial Training Enrich training data with perturbations [Goodfellow et al, 2015] toxic toxic Seeing perturbed data during training allows the model to recognize it during testing 0.5 . . 0.3 0.8 . . 0.1 . . . . . . u c k i n g p = 0.2 p = 0.2 . . . Training Training data data f u c k i n g 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 45

Exeriments & Results 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 46

General Setup ELMo embeddings [Peters et al, 2018] 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 47

General Setup ELMo embeddings [Peters et al, 2018] Evaluation on four downstream tasks: Character level: Grapheme to phoneme conversion Word level: PoS Tagging Chunking Sentence level: Toxic comment classification 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 48

General Setup ELMo embeddings [Peters et al, 2018] Evaluation on four downstream tasks: Character level: Grapheme to phoneme conversion Word level: PoS Tagging Chunking Sentence level: Toxic comment classification Different embedding spaces for generating perturbed test data 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 49

General Setup ELMo embeddings [Peters et al, 2018] Evaluation on four downstream tasks: Character level: Grapheme to phoneme conversion Word level: PoS Tagging Chunking Sentence level: Toxic comment classification Different embedding spaces for generating perturbed test data 03.06.2019 | Computer Science Department | UKP Lab | Ji-Ung Lee 50

Visually Attacking and Shielding NLP Systems for Human-like Text Processing

Download Presentation

Presentation Transcript

Related

More Related Content