GitXplorerGitXplorer
B

sv-order-2021

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
1a5324df870b07e1abef36c3e23d48812261235b

update

committed 2 years ago
Unverified
4a164d146bb02c0b45f5b9c405489c24825ce188

add requirements

committed 3 years ago
Unverified
9cc73c2717dc8fb7c773022eb4ad5ea8c44d18fa

Create README.md

committed 3 years ago
Unverified
58f366f2d21cd8291f0bc98204136f3e3a7375c5

add docs

committed 3 years ago
Unverified
2caed76de7755e4bc70f98d549bd1145c3a99483

Update extract_frequencies_from_corpus.py

committed 3 years ago
Unverified
5e601dfab27eebc5e16a24ed19f07616630d0545

also collect subject pre/post positions

committed 3 years ago

README

The README file for this repository.

Subject-verb order experiments

To use our scripts, clone this repository and then install the required libraries with

pip install -r requirements.txt

All relevant scripts have a help section, which you can call with the -h option, for instance

python add_frequencies_to_df.py -h

Models

We make use of the recent (December 2021) SOTA models by spaCy. Specifically the nl_udv25_dutchalpino_trf model, in part described here.

Before using our scripts, you should install it with the following command (or install from the requirements file):

python -m pip install https://huggingface.co/explosion/nl_udv25_dutchalpino_trf/resolve/main/nl_udv25_dutchalpino_trf-any-py3-none-any.whl

Data

In our research, we calculated frequencies on the SONAR corpus and limited ourselves to components that were written-to-be-read and published (WRP-). However, we excluded the WRPEA component, which contains data from discussion forums. Its data is riddled with non-standard, colloquial, slang, internet language text, which not only falls outside of the scope of our research objectives, but also makes the job of the parser very difficult (and results unpredictable).

Sentences shorter than three words (e.g. enumerations like "1 .") or longer than 32 words were excluded. The latter restriction for computational feasibility.