7. Privacy and security in ICT, data protection in cyberspace

7.1. Digital footprint

The mentioned threats, or rather risks, very often consist of leaving digital footprints in cyberspace. Digital footprints, based on whether or not they can be influenced by a user, can generally be divided into footprints that can be influenced (active) and that cannot (passive).

Division of digital footprints:

  • Passive digital footprint

-       Information from a computer system;

-       connection to computer networks, in particular the Internet;

-       use of provided services, etc.

  • Active digital footprint

-      conscious use of services;

-      voluntary disclosure of information;

·      blogs, forums;

·      social media;

·      email;

·      data storage;

·      cloud services, etc.

In the following section, I will focus on some aspects of individual digital footprints and information contained in them. The purpose is to warn users that their actions in the environment of information and communication systems are not as anonymous as they may think.

In the world of ICT, one rule applies: whenever you upload, transfer, mediate or put anything into cyberspace, it stays there “forever”. There will always be a copy (created based on the functionality of a computer system or a copy stored by another user) of your data. And even if you subsequently delete the data, they will not be actually, permanently and irreversibly deleted. It is therefore appropriate to pay attention to your digital footprint and the information or data that we leave behind in the cyberspace environment.


7.1.1 Passive digital footprint

Passive footprints most often arise from the interaction of one computer system with another computer system or from the functionality of a computer system (and associated software). Examples of such traces may be information from the operating system (such as Windows error messages or system information), or other information and data that are stored based on the system’s functionality without having to be transmitted (such as a computer system that has never been connected to any network or other computer system).[1] To say completely uncompromisingly that these footprints cannot be influenced would not be entirely correct. If a user is sufficiently experienced, he/she is able to change, mask or suppress a number of “passive” digital footprints (e.g. by a simple anonymous mode of the web browser that turns off cookies). However, a user’s movement on the Internet can be monitored in a variety of ways.

IP address

A computer system’s connection to the Internet is a typical example of a relatively passive footprint. An IP address or MAC address that are passed along with other ISP information. An IP address is not anonymous by default, and the computer system uses it as one of the identifiers when communicating with other computer systems. IP addresses are assigned hierarchically, with ICANN playing a dominant role, dividing the real world into regions managed by regional internet registrars (RIRRegional Internet Registry). These registrars have been assigned a range of IP addresses from ICANN, which they assign to LIRs within their region. Regional registrars are divided into the following five territories:

1.     “Euro-Asian” region – RIPE NCC: https://www.ripe.net/

2.     “Asia Pacific” region – APNIC: https://www.apnic.net/

3.     “North American” region – ARIN: https://www.arin.net/

4.     “South America” region – LACNIC: http://www.lacnic.net/

5.     “African” region – AFRINIC: http://www.afrinic.net/


Figure – Division of the world between RIRs

The regional registrars[2] operate the Whois service on their websites, which is a name for a database in which data on IP address holders are registered. These databases contain a wide range of information that enables the identification, for example, of a range of public IP addresses used, contact information, abuse contact[3], hierarchically superior connection provider, etc. To determine an “owner” (operator, provider) of a particular IP address, it is often possible to use these freely available databases.[4]

Regional registrars further divide the assigned IP ranges between local internet registrars (Local Internet Registry – LIR). A local registrar is usually an ISP (in the Czech Republic, a provider of information society services, specifically a connection provider, whether public or non-public). This registrar can then provide its range of IP addresses to, for example, parts of its organisation or other entities.


Figure – Information extracted from the RIR database

The abbreviated selection from the RIR database shows the LIR (in this case the CESNET, z. s. p. o. association, using the IP address range: 195.113.0.0/16) and an organisation to which CESNET has assigned part of the public addresses [Police Academy of the Czech Republic with the IP address range 195.113.149.160 – 195.113.149.175. The police academy can again distribute these addresses among other parts of the organisation (e.g. faculties, laboratories, or other sub-networks it manages)]. Depending on the IP address and the exact time, it is possible to determine a specific computer system based on the hierarchical address assignment. Information about a connection of an end computer system (source) to a target computer system (e.g. computer connection to the Internet and displaying the required web page) is stored by individual ISPs throughout the path between the source and the target.

Due to the strict rules defining the management of IP addresses and publicly accessible RIR databases that contain information about the holders of individual address blocks, it is possible to find out very quickly which network a certain IP address belongs to and who operates the network. Thanks to logging information from network traffic, the operator of a given network is then able to identify who (or which computer system) used a particular IP address at a particular time. This determination is a very important source of information in handling security incidents (cyberattacks) and in searching for their source (originator).

Email

Email, as one of the most frequently used services in the Internet environment, is definitely not an anonymous service. A message that is sent from a source to a destination (recipient) typically contains a range of different types of information that can identify both the service provider (email) and the connection provider of the device from which the email was sent. This information is not displayed in the body of the message (i.e. the text we send to a specific person) but in the source code (header) of the message. From this source code, it is possible to find out the path via servers, real sender, source computer name, computer name, time of sending message (including time zone) used by operating system, mail client, etc. Below is an example of a header of forwarded[5] fraudulent email with potentially interesting information marked.


Figure – View information from the header of an email message

Web browser

A web browser is another application that by default passes information about a user and his/her computer system to the computer system (server) of a visited site. Within a query from a client, this server then finds out, for example, the referrer (which is the page from which the user comes), the web browser used and operating system (including the exact version), cookies, flash cookies, history, cache, etc.

In addition to the IP address, these are, among other things, cookies[6] that help create a “fingerprint” of the user’s computer system (computer, smartphone, etc.). This fingerprint allows the specification of a specific computer system[7], even if the user uses a different web browser, or deletes cookies, logs in from a different IP address, etc.

One of the many ways of creating “fingerprinting” currently in use is canvas fingerprinting.[8] Canvas fingerprinting works by having a visited webserver instruct the user’s web browser to “draw a hidden image.” This image is unique to any web browser and computer system.  The drawn image is then converted into an ID code, which is stored on the web server in case the user visits it again.[9]


Figure – Example of Canvas Fingerprinting

In addition to fingerprinting, it is also interesting to monitor the transfer of information to third parties (both entities and services that can further use user information) in a web browser. By default, this transfer takes place on the basis of the Terms of Service agreed to with an ISP. For example, each end user can use the Light Beam application[10], which displays all the pages with which a user (often unknowingly) communicates on the website. (Data are passed on to third parties.) Passing information about users to third parties is certainly not exceptional. On the contrary, in the digital world it is a matter of course and a “necessary prerequisite” for the functioning of many ISPs.

1.     The first slide shows Firefox activity for the period from 30 July 2016 to 4 August 2016. During that period, 154 pages were visited, and 390 third-party pages were linked.


2.     The second print screen displays the same map but filters out third-party pages that are represented by triangles.


3.     The last print screen displays the LightBeam application after cleaning and displaying the following pages: www.seznam.cz; www.google.com;



Other applications

In the following part of the text, I will partially focus on smart devices (smartphones, tablets, etc.) and applications associated with “smart devices” activities. I purposefully choose these devices because they are computer systems in which users install probably the largest number of programs (very often unverified, only recommended by a “friend”). It is these devices that, due to contractual terms and conditions among other things, do not have to be under the full control of the user, administrator, etc., that pose a security risk for both the end user and the company (organisation).

The previously mentioned statistical survey[11] shows that on average we spend on the Internet: 4.4 hours (access via computer in the form of a desktop PC or laptop, etc.) and 2.7 hours (access via mobile devices) per day. In the case of a computer, the security of the device is usually ensured, but mobile devices (smartphones, tablets, etc.) usually do not have policies set for possible software installation (either from trusted or untrustworthy sources) and often lack basic protection in the form of an antivirus program.[12]

An end user has the option to primarily install software on the Android OS device, and this software will pass on (to other entities) and store information about its activities, including the storage and transfer of the content of the transmitted information. The Play Store service, which is provided by Google within the Android OS, allows any developer to set rules for what the application should collect, for example, and where to send this data.

Personally, I believe that it is not a mistake to allow developers and application developers to obtain sufficient information about their applications, their functionality, etc. If we regulate the collection of this information, then we will undoubtedly regulate and hamper possible progress and subsequent development of these and other applications. On the other hand, there are attackers who, because Play Store does not authenticate and scan applications, can offer malware-infected applications that, when installed on an end-computer system, can take control of an end-user smartphone, for example.

Identification of a computer system based on information from its components

One of the unique, yet in some circumstances changeable, computer system identifiers is a MAC address, which is tightly bound to a computer system’s network card. However, a network card is not the only hardware component that is able to pass on a unique computer system identifier to another computer system.

Researchers at Princeton University have found that a computer system can be identified, for example, by the system’s battery information, and web browsers are an essential part of transmitting this information.[13]

In practice, a procedure is used that uses the capabilities of HTML5. This standard includes a function that allows websites (or web servers) to identify a battery level of the computer system that accesses them. (Information is passed on what percentage of the battery remains and how long it approximately takes to discharge or charge.) The idea of ​​web server owners is that a user who is running low on battery will be shown a cost-effective version of a web page. The two scripts described by Princeton University researchers are already actually using battery data, while also collecting additional information – such as an IP address or a canvas fingerprinting. Such combinations can already provide a very accurate identification of a computer system.[14]


7.1.2 Active digital footprint

An active digital footprint that can be influenced represents all information that a user voluntarily transfers about himself/herself to another person (whether natural or legal, or even ISP). Transferring may include a number of activities, such as sending an email, adding a post to a discussion, forum, publishing any media (photo, video, audio, etc.) on social media, etc. The term also includes a registration and use of all conceivable services within cyberspace [e.g. operating systems, emails (including freemail), social media, dating, P2P networks, chats, blogs, bulletin boards, websites, cloud services, data storages, etc.].

Active digital footprints are footprints over which users can have relative control, and it is only up to them what information about themselves they intend to make available to others. However, it is necessary to draw attention to the already mentioned premise: any data or information entered into cyberspace will remain in cyberspace.

Theoretically, it would be possible to define a category of hypothetically active footprints, which is in a way an oxymoron. However, this category includes certain facts that a user can theoretically influence, i.e. is able to influence them but usually does not because it would in effect significantly limit his/her functioning in the digital world. These footprints could include, for example, the use of the services of the largest ISPs (Microsoft, Apple, Google, Facebook, etc.), for which the use of the service is subject to the agreement of the Terms of Service (EULA) that in turn allow these ISPs to obtain a significant amount of information. Furthermore, it is possible to include in these footprints also footprints that arose, for example, by correlating active and passive footprints; information that other users disclose about us; data that are mirrored; EXIF data[15]



[1] This means mainly information that is logged and archived about the activities of users in places to which a user does not have access and does not have them under control [e.g. the user is not able to delete logs proving his/her activity (e.g. access, sending email, etc.) on the mail server].  On their own computer, users can influence the stored data and information. They are entitled to delete (e.g. history, e-mails, etc.), edit, etc. 

[2] Regional internet registries. [online]. [cit.04/08/2015]. Available from: https://www.nro.net/about-the-nro/regional-internet-registries

[3] This is a contact that a user can get in touch with if he/she is harmed by a given IP address or range of addresses (for example, there is a cyberattack in the form of spam, phishing, etc.). It is the contact closest to the source of the attack.

[4] However, this is not the only database. There are a number of services that offer the same information. I will also mention other databases as an example: http://whois.domaintools.com/; https://www.whois.net/; http://www.nic.cz/whois/; https://whois.smartweb.cz/, etc.

[5] the email was forwarded from: jan.kolouch@fit.cvut.cz to: kyber.test@seznam.cz

[6] In HTTP, the term cookie refers to a small amount of data that a visited webserver (a visited web page) sends to a web browser, which then stores it on the user’s computer. This data are then sent back to the web server each time you visit the same server. 

[7] If a user wants to learn more about what a web browser reveals about their activity, I recommend the following URLs: http://panopticlick.eff.org, http://browserspy.dk/ http://samy.pl/evercookie.

[8] ANGWIN, Julia. Meet the Online Tracking Device That is Virtually Impossible to block. [online]. [cit.10/06/2016]. Available from: https://www.propublica.org/article/meet-the-online-tracking-device-that-is-virtually-impossible-to-block

[9] Example of Canvas fingerprinting. A test showing the fingerprint of your web browser can be tested within the article ANGWIN, Julia. Meet the Online Tracking Device That is Virtually Impossible to block. [online]. [cit.10/06/2016]. Available from: https://www.propublica.org/article/meet-the-online-tracking-device-that-is-virtually-impossible-to-block

[10] The application enables graphical display of interconnection of individual services and transfer of information to third parties. This is a Firefox web browser add-on that is available at: https://www.mozilla.org/en-US/lightbeam/.

[12] It should be noted that, for example, a report issued by Kaspersky Lab shows that there are more than 340,000 types of malware intended primarily for mobile devices. Kaspersky Lab further states that 99% of this malware targets Android devices. It should be noted that this targeting is perfectly understandable as the variability of individual devices and versions of the Android OS is considerable. (Some reports state that more than 24,000 types of different devices use the Android OS.)

For more details, see e.g.:

The very first mobile malware: how Kaspersky Lab discovered Cabir. [online]. [cit.01/08/2016]. Available from: http://www.kaspersky.com/about/news/virus/2014/The-very-first-mobile-malware-how-Kaspersky-Lab-discovered-Cabir

See also: Interesting Statistics On Mobile Strategies for Digital Transformations. [online]. [cit.15/07/2016]. Available from: http://www.smacnews.com/digital/interesting-statistics-on-mobile-strategies-for-digital-transformations/

The fragmentation of Android has new records: 24 000 different devices. [online]. [cit.15/07/2016]. Available from: http://appleapple.top/the-fragmentation-of-android-has-new-records-24-000-different-devices/ 

[13] For more details see ENGLEHARDT, Steven and Ardvin NARAYANAN. Online tracking: A 1-million-site measurement and analysis. [online]. [cit.05/08/2016]. Available from: http://randomwalker.info/publications/OpenWPM_1_million_site_tracking_measurement.pdf

[14] For more details see VOŽENÍLEK, David. Promazání „sušenek“ nepomůže, na internetu vás prozradí i baterie. [online]. [cit.04/08/2016]. Available from: http://mobil.idnes.cz/sledovani-telefonu-na-internetu-stav-baterie-faz-/mob_tech.aspx?c=A160802_142126_sw_internet_dvz

[15] EXIF – Exchangeable image file formatIt is a format of metadata that is embedded in digital photos by digital cameras. These metadata include, for example:

  • Camera brand and model.
  • Date and time a picture was taken.
  • GPS position.
  • Information about the author (the person who registered the camera).
  • Camera settings.
  • Preview of an image, etc.