AI, Machine Learning, our Minds and Society


  1. Semantics in AI, Topics Modelling and Machine Learning
  2. Closest Content and the Power of Suggestion
  3. Example: Data Driven Video on Demand (VOD)
  4. Psychological Reward System and Behavioural Addiction
  5. Data and Profit Driven Production vs Artistic Expression

Semantics in AI, Topics Modelling and Machine Learning

In Computer Sciences, we call semantics the relationship between the elements in a document (e.g. words, expressions, slang...) and the meaning of the document (or a section of the document), as it can be understood by a human. The notion of a topic is originally used in linguistics, which studies natural languages.

In Computer Science, this notion is quickly difficult, in particular because the meaning of the document will be different from one person to another (we say that the meaning is subjective), in particular in relation to the culture of the person. Semantic analysis and modelling will therefore depend to some extent on the society to which it is applied.

A typical application of automated topics analysis and processing is search engines, which is a case of data mining (i.e. needle in a haystack type of problems in data processing), through which we can find the documents with topics related to some keywords.

In artificial intelligence and machine learning, semantics can be first approached through topics modelling. The first problem, which is, for instance, is to analyse what is the topic of a document. Then, we construct a mathematical model and a digital model of the different topics, which can be queried (for instance through searching for keywords).

topics modelling is typically approached through the frequency distribution of words (the vocabulary) contained in different documents. To that aim, we distinguish different kinds of words:

By analysing many documents, for example through crawling all over the Web, or all over entire book or e-book libraries, and indexing all the found documents, we can quantify the relation of the different words and expressions to different features such as gender or left-right spectrum.

Note that the statistics obtained will contain the same biasses and stereotypes as the collection of analysed documents, and very likely as the society which produced those documents. The includes the idea that the creators of the documents might not be socially representative (for example politically representative) of the considered society.

In fact, I would argue, the differences objectivated through statistics, for instance about gender relationship to attitudes, are very difficult to distinguish from gender stereotypes at all, as no one has to behave accordingly. As far as I am concerned, I would not call trans-gender someone who has a psychosocial and behavioural gender seemingly different from the biological gender; we just don't have to be stereotyped...

Note that the social groups themselves can sometimes be determined or constructed automatically though unsupervised learning techniques for the purpose of clustering, which consists precisely in automatically detecting distinct groups is a data sample set used for learning.

Then, each content, be it a document, some song lyrics, novel, story, movie, video content can be tagged, in the sense that some value or scale can be attributed to different features (such as gender or left-right), in order to create metadata associated to this content. Some tags can be added manually or modified by a human for finer analysis of the content and to avoid errors, possibly previously detected ad allotted as outliers.

Closest Content and the Power of Suggestion

One application of those machine learning techniques is to suggest, recommend or present targeted (presented as "relevant" by advocates) content to a knows human user. It is assumed that a profile of the user has been determined through observing his/her patterns of behaviour, Web browsing and consumption. See also the page about privacy, Web browsing and media.

For example, a subscriber to a video on demand service (or a tracked/registered user of a video/audio platform, or e-book store, news reader, search engine keywords, visited Web pages, etc. ) will have the features of all the viewed/bought content averaged, to determine some characteristic features (such as gender psychological orientation or right-left leaning) for that person, which constitute her/his profile.

The person is thus associated some tags, in order to create metadata associated to that person, which are pretty much the same as the metadata associated with content. Then, a correlation with the tags associated to any content can be estimated, in order to determine which content is closer (most "relevant" according to advocates) to the user.

This approach is generally considered "relevant" from the point of view of marketing, in the sense that it indicates the tastes of the person, and therefore the item which the person is most likely to purchase/consume. Note that in a practical application, the suggested content may be somewhat randomized around the optimum to avoid annoyances of deterministic machine behaviours (such as infinite repetition in a loop).

a) The content's position and person's profile in the 2 features plane

b) Formulas to estimate the correlation of the person's profile with the content's metadata for those two tags.

Figure 1. Correlation between a person's profile P with different contents A, B, C and D according to two features (gender and left-right political spectrum tags).

Note that on Figure 1.a above, the two axis for the tags evaluation of a feature appear to cross on a definite point like a centre point (or origin). In fact, this centre point does not really exist in practice (it just doesn't make any sense), and is rather arbitrarily determined, in general through the setting of a (possibly high) number of method parameters throughout the data processing steps. Different coordinate systems, with different origin) could be chosen without one being arguably better than the other. A small change in one of these parameters can shift the values and the estimation of the correlation. We cannot speak, therefore, of a "balanced policy" for these techniques; it just doesn't make any sense.

Example: Data Driven Video on Demand (VOD)

Since I already used the example of amazon's practices in the page about privacy, Web browsing and multimedia content, I will illustrate this section with the practices of Netflix, which is another big player in the field of Video on Demand (VOD). The principle of the recommendation mechanism appears similar in the two services, possibly with a different cursor on biasses and policies.

Evidence about Recommendation Policy

The philosophy behind data driven VOD services and marketing can be very well illustrated by tth principle of the Netflix Prize, which is a competition to predict the user's ratings for films. This suggests that the marketing strategy is to give the customer the cultural products that they expect and they like.

This approach to content suggestions and recommendations si combined with a subscription based pricing policy in which the subscribers can watch as much content as they wish at no extra cost (except the energy and bandwidth costs). It can be observed that the average time spent watching VOD by subscribers is clearly unbalanced, as compared to other activities, including social activities and self care such as cooking or exercising. See also these figures about Netflix.

Binge Watching, Environmental Psychology and Cognition

From the point of view of environmental psychology, it can be understood that the subscribers of Netflix will be influenced to a substantial extend, generally without being aware, by the consumed content, in their behaviours, as well as in their views about the world they live in. The same can be said about any consumer watching video content, however.

The above mentioned policy, however, can be seen to lead to specific cases of addict behaviours called binge watching. This affect cognition to a large extent, in the first place because the user is passive during the time spent, and does not exert creativity. This can also affect negatively, especially with the youth, other important and framing cognitive activities, especially school, studies and other learning processes.

Indeed, trial and errors approaches, social feedback in learning, be it positive or negative, is essential to the motivation of schoolchildren and students. See also this article and this other one about relationships between screens and Attention Deficit Disorders.

Biasses, Positive Reinforcement and Social Cliques

Furthermore, the principle of "giving users what they want", as illustrated in the Netflix Prize, leads to a so-called positive feedback loop, which is worrysome from the point of view of environmental psychology:

Psychological Reward System and Behavioural Addiction

The example of Video on Demand above is typical of a binge-whatever behaviour, which is technically characterized as a behavioural addiction. The same neural patterns can be found in drug addictions (including recreational drugs) and some compulsive patterns liable to impact health on different levels with presumably non toxic regular consumption products.

Typically, the reward system, in an uncontrolled and unbalanced yield to easy pleasure and consumption desire, causes an adaptation of dopamine reward pathways through neural plasticity. This spirals into an ever growing appetite for the same kind of experiences.

Here are a few examples:

Data and Profit Driven Production vs Artistic Expression

For some time, recreational cultural content has known different processes, especially since the leisure and free time spending by individuals has become widespread and came to represent important cash flow, entertainment became an industry for mass consumption. Three distinct approaches and philosophies must now be distinguished:

  1. True Artistic Creation and Expression, by which an Artist intends to express complex feelings or perceptions about an internal, relational, social, or other reality, possibly surreal. The artist often feels compelled, sometimes feels an urge to express something such as experiences about the self or its context, often in relation to other cultural references (which allow to factor and involve experiences and expression expressed by others) through artistic activity.
  2. Mass Entertainment Production, by which some investors decide to grant a budget, possibly on the basis on a senario or prior references of a creator of content, to produce an entertaining content for the main purpose of profit oriented business, even if it is marketed as a true creation. The standards by which the creator's references or would-be story are assessed involve prospects of return on investment, which substantially bounds the possibility for expression of creators. In particular, the content should contain a limited number of cultural references, which should be presented as self-contained, i.e. explained within the product. This makes, in particular, such entertainment creations hardly acceptable to the people who already know the references involved...
  3. Data Driven Production takes this even further, as the production is not only guided by a recommendable director having a valuable idea, but by the detection of marketing opportunities in a segment of the entertainment business by thorough measurements of consumer's tastes and demand (even if it is marketed as a true creation...).