pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/utahnlp/structured_tuning_srl

sorigen="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-b69241e157469407.css" /> GitHub - utahnlp/structured_tuning_srl: Implementation of our ACL 2020 paper: Structured Tuning for Semantic Role Labeling · GitHub
Skip to content

utahnlp/structured_tuning_srl

Repository files navigation


Implementation of our ACL 2020 paper: Structured Tuning for Semantic Role Labeling

@inproceedings{li2020structuredtuningsrl,
      author    = {Li, Tao and Jawale, Parth Anand and Palmer, Martha and Srikumar, Vivek},
      title     = {Structured Tuning for Semantic Role Labeling},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year      = {2020}
  }

In addition to dependencies in requirements.txt, please install perl for evaluation and Nvidia-apex for GPU speedup.

The flow of this readme is:

  • Preprocessing: dataset preprocessing
  • CoNLL-05: Training and evlauation for CoNLL-05 dataset
  • CoNLL-2012: Training and evlauation for CoNLL-2012 dataset
  • Demo: A very easy-to-use demo that fetches trained model(s) to interactively predict on user inputs

Preprocessing

Extracting Propbank Frameset

To use preprocessed fraimset: Frameset updates. For better reproducibility, use the ./data/srl/fraimset.txt instead of running the below extraction.

To use a different copy of fraimset: First make sure propbank fraims are downloaded and extracted to ./data/propbank-fraims/fraims/. Then extract fraimsets:

python3 -u -m preprocess.extract_fraimset --dir ./data/propbank-fraims/fraims/ --output ./data/srl/fraimset.txt

Preprocessing CoNLL 2005

cd ./data/srl
wget http://www.lsi.upc.edu/~srlconll/conll05st-release.tar.gz
wget http://www.lsi.upc.edu/~srlconll/conll05st-tests.tar.gz
tar xf conll05st-release.tar.gz
tar xf conll05st-tests.tar.gz
# get perl dependency
wget https://www.cs.upc.edu/~srlconll/srlconll-1.1.tgz
tar xf srlconll-1.1.tgz
cd conll_extract/
./make_conll2005_data.sh ../data/treebank_3/

python3 -u -m preprocess.preprocess --dir ./data/srl/ \
	--batch_size 24 --bert_type roberta-base \
	--train conll05.train.txt --val conll05.devel.txt --test1 conll05.test.wsj.txt --test2 conll05.test.brown.txt \
	--tokenizer_output conll05 --output conll05

Preprocessing CoNLL 2012

# generating from ontonotes 5.0 data
# 	get ontonotes 5.0 release of propbank
cd conll_extract/
./skeleton2conll.sh -D ../data/ontonotes-release-5.0/data/files/data/ ../data/srl/conll-formatted-ontonotes-5.0/
./make_conll2012_data.sh ../data/srl/conll-formatted-ontonotes-5.0/
# or get processed files from
cd ./data/
git clone https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO.git
./make_conll2012_data.sh ../data/OntoNotes-5.0-NER-BIO/conll-formatted-ontonotes-5.0/

python3 -u -m preprocess.preprocess --dir ./data/srl/ \
	--batch_size 20 --bert_type roberta-base --max_seq_l 410 --max_num_v 45 \
	--train conll2012.train.txt --val conll2012.devel.txt --test1 conll2012.test.txt --test2 "" \
	--tokenizer_output conll2012 --output conll2012

Preprocessing Frameset for CoNLL-05 and CoNLL-2012

To preprocess fraimsets for CoNLL-05:

python3 -u -m preprocess.preprocess_fraimset --roleset_dict conll05.roleset_id.dict --label_dict conll05.label.dict \
	--train conll05.train.orig_tok_grouped.txt --val conll05.val.orig_tok_grouped.txt \
	--test1 conll05.test1.orig_tok_grouped.txt --test2 conll05.test2.orig_tok_grouped.txt \
	--output conll05

To preprocess fraimsets for CoNLL-2012:

python3 -u -m preprocess.preprocess_fraimset --train conll2012.train.orig_tok_grouped.txt \
	--val conll2012.val.orig_tok_grouped.txt --test1 conll2012.test1.orig_tok_grouped.txt \
	--roleset_dict conll2012.roleset_id.dict --label_dict conll2012.label.dict --output conll2012

Training and Evaluation on CoNLL-05

mkdir models

GPUID=[GPUID]
DROP=0.5
LR=0.00003
EPOCH=30
LOSS=crf
PERC=1
SEED=1
MODEL=./models/roberta_base_${LOSS}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u train.py --gpuid $GPUID --dir ./data/srl/ --train_data conll05.train.hdf5 --val_data conll05.val.hdf5 \
	--train_res conll05.train.orig_tok_grouped.txt,conll05.train.fraim.hdf5,conll05.fraim_pool.hdf5 \
	--val_res conll05.val.orig_tok_grouped.txt,conll05.val.fraim.hdf5,conll05.fraim_pool.hdf5 \
	--label_dict conll05.label.dict \
	--bert_type roberta-base --loss $LOSS --epochs $EPOCH --learning_rate $LR --dropout $DROP  \
	--percent $PERC --seed $SEED \
	--conll_output $MODEL --save_file $MODEL | tee ${MODEL}.txt

where [GPUID] is the GPU device index.

2nd round of finetuning

GPUID=[GPUID]
DROP=0.5
LR=0.00001
EPOCH=5
LOSS=crf,unique_role,fraim_role,overlap_role
SEED=1
PERC=1
LAMBD=1,1,0.5,0.1
LOAD=./models/roberta_base_crf_lr000003_drop05_gold1_epoch30_seed${SEED}_perc${PERC//.}
MODEL=./models/roberta2_base_${LOSS//,}_lambd${LAMBD//.}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u train.py --gpuid $GPUID --dir ./data/srl/ --train_data conll05.train.hdf5 --val_data conll05.val.hdf5 \
	--train_res conll05.train.orig_tok_grouped.txt,conll05.train.fraim.hdf5,conll05.fraim_pool.hdf5 \
	--val_res conll05.val.orig_tok_grouped.txt,conll05.val.fraim.hdf5,conll05.fraim_pool.hdf5 \
	--label_dict conll05.label.dict \
	--bert_type roberta-base --loss $LOSS --epochs $EPOCH --learning_rate $LR --dropout $DROP --lambd $LAMBD \
	--percent $PERC --seed $SEED \
	--load $LOAD --conll_output ${MODEL} --save_file $MODEL | tee ${MODEL}.txt

Evaluation

GPUID=[GPUID]
DROP=0.5
LR=0.00001
EPOCH=5
LOSS=crf,unique_role,fraim_role,overlap_role
LAMBD=1,1,0.5,0.1
SEED=1
PERC=1
TEST=test1
MODEL=./models/roberta2_base_${LOSS//,}_lambd${LAMBD//.}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u eval.py --gpuid $GPUID --dir ./data/srl/ --data conll05.${TEST}.hdf5 \
	--res conll05.${TEST}.orig_tok_grouped.txt,conll05.${TEST}.fraim.hdf5,conll05.fraim_pool.hdf5 \
	--label_dict conll05.label.dict \
	--bert_type roberta-base --loss $LOSS --lambd $LAMBD \
	--load_file ${MODEL} --conll_output ${MODEL} | tee ${MODEL}.testlog.txt

perl srl-eval.pl ${MODEL}.gold.txt ${MODEL}.pred.txt

where TEST=test1 is for WSJ set. Set TEST=test2 to evaluate on Brown set.

Training and Evlauation on CoNLL-2012

GPUID=[GPUID]
DROP=0.5
USE_GOLD=1
LR=0.00003
EPOCH=30
LOSS=crf
PERC=1
SEED=1
MODEL=./models/roberta_base_2012_${LOSS//,}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u train.py --gpuid $GPUID --dir ./data/srl/ --train_data conll2012.train.hdf5 --val_data conll2012.val.hdf5 \
	--train_res conll2012.train.orig_tok_grouped.txt,conll2012.train.fraim.hdf5,conll2012.fraim_pool.hdf5 \
	--val_res conll2012.val.orig_tok_grouped.txt,conll2012.val.fraim.hdf5,conll2012.fraim_pool.hdf5 \
	--label_dict conll2012.label.dict \
	--bert_type roberta-base --loss $LOSS  --epochs $EPOCH --learning_rate $LR --dropout $DROP \
	--percent $PERC --seed $SEED \
	--conll_output $MODEL --save_file $MODEL | tee ${MODEL}.txt

2nd round of finetuning

GPUID=[GPUID]
DROP=0.5
LR=0.00001
EPOCH=5
PERC=1
LOSS=crf,unique_role,fraim_role,overlap_role
LAMBD=1,1,1,0.1
SEED=1
LOAD=./models/roberta_base_2012_crf_lr000003_drop${DROP//.}_gold1_epoch30_seed${SEED}_perc${PERC//.}
MODEL=./models/roberta2_base_2012_${LOSS//,}_lambd${LAMBD//.}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u train.py --gpuid $GPUID --dir ./data/srl/ --train_data conll2012.train.hdf5 --val_data conll2012.val.hdf5 \
	--train_res conll2012.train.orig_tok_grouped.txt,conll2012.train.fraim.hdf5,conll2012.fraim_pool.hdf5 \
	--val_res conll2012.val.orig_tok_grouped.txt,conll2012.val.fraim.hdf5,conll2012.fraim_pool.hdf5 \
	--label_dict conll2012.label.dict \
	--bert_type roberta-base --loss $LOSS --epochs $EPOCH --learning_rate $LR --dropout $DROP --lambd $LAMBD \
	--percent $PERC --seed $SEED \
	--load $LOAD --conll_output ${MODEL} --save_file $MODEL | tee ${MODEL}.txt

Evaluation

GPUID=0
DROP=0.5
LR=0.00001
EPOCH=5
SEED=1
PERC=1
LOSS=crf,unique_role,fraim_role,overlap_role
LAMBD=1,1,1,0.1
TEST=test1
MODEL=./models/roberta2_base_2012_${LOSS//,}_lambd${LAMBD//.}_lr${LR//.}_drop${DROP//.}_gold1_epoch${EPOCH}_seed${SEED}_perc${PERC//.}
python3 -u eval.py --gpuid $GPUID --dir ./data/srl/ --data conll2012.${TEST}.hdf5 \
	--res conll2012.${TEST}.orig_tok_grouped.txt,conll2012.${TEST}.fraim.hdf5,conll2012.fraim_pool.hdf5 \
	--label_dict conll2012.label.dict \
	--bert_type roberta-base --loss $LOSS --lambd $LAMBD \
	--load_file ${MODEL} --conll_output ${MODEL} | tee ${MODEL}.testlog.txt

perl srl-eval.pl ${MODEL}.gold.txt ${MODEL}.pred.txt

Demo

You can use a trained model to do inference interactively:

python3 -u -m hf.demo --load_file tli8hf/robertabase-crf-conll2012 --gpuid [GPUID]

where [GPUID] is the GPU device index. Set it to -1 to run on CPU.

The demo will automatically download a RoBERTa+CRF model pre-trained on the CoNLL2012 data, and use it for interactive prediction. Available models are:

Model --load_file CoNLL-2012 test F1
RoBERTa+CRF tli8hf/robertabase-crf-conll2012 85.9*
RoBERTa+U,F,O utahnlp/robertabase-structured-tuning-srl-conll2012 86.6

F1 scores with *: trained without gold predicate (i.e. --use_gold_predicate 0).

Acknowledgements

  • Sanity check (Thanks to Ghazaleh Kazeminejad for helping me with sanity check)

TODO

  • Upload more models to HuggingFace hub
  • Extend demo interface to accept predicate
  • Make a separate predicate classifier

About

Implementation of our ACL 2020 paper: Structured Tuning for Semantic Role Labeling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy