Mobile CrowdSensing Dataset
Please
use the following citation if you use the generated dataset for academic
research:
Chen, Zhiyan, Murat Simsek, and Burak Kantarci. "Region-Aware Bagging and Deep Learning-Based Fake Task Detection in Mobile Crowdsensing Platforms." GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020. (bibtex)
Dataset Link:dataset
Dataset is generated by CrowdSenSim simulation tool [1]. Dataset contains legitimate tasks and fake tasks [2]. The task attributed are as follows: {’ID’, ’latitude’, ’longitude’, ’day’, ’hour’, ’minute’, ’duration’, ’remaining time’, ’battery requirement %’, ’Coverage’, ’legitimacy’, ’GridNumber’, ’OnpeakHour’}. Location of tasks are specified by ’latitude’ and ’longitude’ together. Furthermore, ’day’, ’hour’ and ’minute’ describe the task publish time. ’Duration’ denotes task active duration in terms of minutes. ’Remaining time’ denotes the residual time of a sensing task till its completion. ’Battery requirement’ is percentage of battery required to complete a task. ’Coverage’ denotes task sensing distance. ’Legitimacy’ describes whether a task is illegitimate one or legitimate one. This feature is used only in training of the machine learning models as the MCS platform is unaware of task legitimacy when a task is submitted. ’GridNumber’ is obtained by splitting sensing city map to small grids with numbers beginning at 1. ’OnpeakHour’ is a binary flag to indicate if task start time occurs during 7am to 11am. We define 7am to 11am as the peak hour and other hours are non-peak for the sake of simplicity in simulations. Based on the configuration of task generation in Table 1, the dataset is created including total 14,484 tasks, with 12,587 legitimate tasks and 1,897 fake tasks, respectively.
Table 1 Tasks generation
configuration
|
Fake Tasks |
Legitimate
Tasks |
Day |
Uniformaly distributedly
in [1, 6] |
Uniformaly distributedly
in [1, 6] |
Hour |
80%: 7am to 11am; 20%: 12pm
to 5 pm |
8%: 0am to 5am; 92%: 6pm to
23pm |
Duration (min) |
70% in {40, 50, 60}; 30% in
{10, 20, 30} |
Uniformly distributed over
{10, 20, 30, 40, 50, 60} |
Battery usage |
80% in {7%-10%}; 20% in
{1%-6%} |
Uniformly distributed in
{1%-10%} |
Recruitment Radius |
Uniformly distribute in 30m
to 100m |
Uniformly distribute in 30m
to 100m |
Movement Radius |
[10m, 80m] |
[10m, 80m] |
Number of Tasks |
1,897 |
12,578 |
Further References
Reference
1 is for the crowdsensim simulation platform;
Reference 2 is the initial reference where the theat
of fake tasks had been introduced.