Webinar Video: TU Delft’s Data Stewardship Project
A Portal for Scientific Audiovisual Media: Analysing User Needs
Read another interesting blog-paper written by Margret Plank, Head of Competence Centre for Non-textual Materials at German National Library of Science and Technology (originally presented at the LIBER Annual Conference 2012 in Tartu)
With more than 6 million media units, 25, 000 journal titles and about 12. 5 million patents, the German National Library of Science and Technology (TIB) ranks as one of the largest specialized libraries worldwide. It is jointly financed by the federal government and the federal states (“Länder”) of Germany. TIB is a member of the Leibniz-Association, an umbrella organisation for 86 institutions conducting research and providing scientific infrastructure. The TIB's task is to comprehensively acquire and archive literature from around the world pertaining to engineering and the natural sciences. The TIB´s portal GetInfo bundles access to leading subject databases, publishing house offerings and library catalogues with integrated full text delivery. In doing this, GetInfo offers a worldwide unrivalled supply of technical and natural scientific information. The TIB actively participates in a large number of projects dealing with the development of corresponding specialist technologies, with a key focus on the digital library.
Besides scientific texts such as books or magazines, the supply, use and significance of non-textual media such as audiovisual media, 3D-objects and research data is continually increasing in the areas of research and education and only a tiny proportion of these materials can be searched and used at the present time. In addition there is a huge diversity as far as formats of non-textual objects are concerned and those formats are constantly changing over time.
Hence there is an urgent need for new concepts, workflows, tools and standards for non-textual materials in order to make searching for non-textual material as easy as it already is for textual information. Therefore the TIB has set up a competence centre for non-textual materials with the goal to fundamentally improve access to, and use of, non-textual objects and to enable new forms of usage for existing inventories.
In this context, the TIB, together with the Hasso-Plattner Institut for software system engineering at Potsdam University (HPI), is developing a web-based platform for audiovisual media. The forthcoming audiovisual portal optimises access to scientific films from the fields of technology and the natural sciences such as computer animations, lecture and conference recordings.
In order to ensure future accessibility and usability of knowledge via the audiovisual portal, the design team makes use of user-centred methods such as focus groups, prototyping and usability testing. A needs analysis has been carried out and on that basis a low-fidelity prototype was developed and optimised in several iterative design steps. This paper aims at describing the user-centred design process, the key findings and a first prototype of the audiovisual portal.
2. User-centered Design
In the spring of 2010, the TIB, together with an agency which specialises in usability, carried out an analysis which was designed to identify users’ demands in respect of the collection and provision of scientific AV media in the TIB. The User-Centred Design (UCD) approach was chosen for this. As a process model, this offers several methods for the development of information systems which can be meaningfully used in a library context to develop user-friendly approaches. DIN EN ISO 9241-210 (DIN 2010)[i]serves as a basis for this user-centred approach. There, the process of designing user-friendly systems based on the phase analysis of the usage context, definition of the requirements, conception and design/prototyping and evaluation is described. The following measures were implemented in the AV-portal project:
- Expert interviews with representatives from scientific institutes, film institutes, libraries and universities
- Context analysis: Research into publicly available AV-portals, content-based search methods and visualisation
- Development of a prototypical AV-portal on the basis of the results
- Focus groups with users from the target groups
3. Expert Interviews
In this phase, and as part of the requirements analysis, experts from the fields of information science, media informatics and media science were asked questions in a structured 60-minute telephone interview. Expert interviews are particularly well-suited when new services based on innovative processes from research are developed and intended to be put into practice. The interviewees were first asked to give their assessment of the state-of-the-art in multimedia retrieval and to name relevant projects with innovative search methods. Finally, the interview participants were asked to make reasonable recommendations as to which processes should be integrated into the prototypes and which should not. The results were then compared to the users’ requirements.
Selected results of the expert interviews:
Linking of videos with other media in the context is important
The experts recommended the conception of a portal where AV media are well linked to other objects, such as textual media. In particular, a link to the publications in which the video is cited, as well as to the primary data belonging to the video, is decisive as users only rarely search within a media type. No project is known where a link has been established between a video and its associated publication or primary data, even though this is a perfectly reasonable approach. To establish this the other way around, i.e. to get from a publication to the associated video/s, one of the experts mentioned the Association for Computing Machinery (ACM).
Appropriate presentation of results central to the relevance assessment by the users
In addition to the search query, it is essential to develop solutions for the presentation of the results. To achieve this, different approaches were suggested by the interviewees who, in their experience, would make it easier for users to quickly decide whether a particular video would be relevant to their question or not. Named methods were a visual abstract or a preview, as well as static images which represent a chapter in the video.
Speech recognition offers useful and applicable approaches
The experts emphasised that the research field of speech recognition has already produced a few good and usable processes which could also play a role in the context of an AV-portal. For example, the audio content of a video could, at the same time, be made available as a text to aid navigation – various news channels actually already do this.
Influence of portals such as YouTube/vimeo
The experts interviewed stressed that the influence of very well-known video portals should not be underestimated. One of the experts pointed out that YouTube, for example, is so successful because the user interface is so well-constructed. In any case, users do have knowledge and experience gained from these portals which they find difficult to get past (“but YouTube do that differently!”).
The similarity-based search was judged by the experts to be difficult to use for the prototype because the films held at the TIB are too heterogeneous. However, this method is already being used by established search engines such as Google (“search for similar images”) and does work well for static images. It is important to define what type of similarity would form the basis (similar in form or colour) and where the added value would be for the user. Similarity in respect of image/video content, i.e. on a semantic level, however, was clearly seen as added value.
Community features in a scientific context were seen as potentially problematic
Not all of the participants were of the opinion that the integration of community features, such as commentary or tagging functions, belong in a portal which is primarily directed at scientists. Whilst some of the experts welcomed the integration, others warned that the inclusion of such community functions would always require moderation and that precisely this moderation of scientists could also present its own problems. Although the sharing and mutual commenting upon knowledge should ideally be desired in a scientific context, a few experts were sceptical.
Scientists have to be encouraged
Two experts emphasised the issue of motivation, as scientists have to be encouraged to make the videos they have created available.
Legal issues and geoblocking
Several experts pointed out that it is important to sort out the legal issues early on and they also drew attention to so-called geoblocking, i.e. the fact that some international portals will only allow their films to be accessed in the respective country and will block IP addresses from Europe, for example.
4. Context Analysis
As part of the context analysis, experts named different examples of video portals which use content-based methods, as well as visual and semantic searches, that have been researched and systematically collected.
Both results (context analysis/best practices and expert assessments) were drawn upon in order to decide which methods should be integrated into the prototype and which aspects should be evaluated in group discussions with users. In the further development of the portal, various providers were identified and evaluated, using a video testbed.
In contrast to a keyword-based search, a semantic search offers the advantage that the user can explore and navigate in the video inventory and can, for example, discover connections or follow cross-references. The user can navigate according to his/her respective interests.
A good example for the integration of semantic search is Yovisto (www.yovisto.com), which is a video portal for lecture recordings. Research and development foci are on automated video analysis and on the integration of so-called user-generated Web 2.0 services like tagging, evaluation and annotation. Yovisto originated from the “OSOTIS – Suche in Multimediadaten” (search in multimedia data) project at the Friedrich Schiller Universität Jena which was funded by the BMWi/ESF. Currently, the project is being continued by the Semantic Web research group at the Hasso-Plattner-Institut in Potsdam.
Another example is the mediaglobe project (www.projekt-mediaglobe.de) which was funded as part of the THESEUS research programme (www.theseus-programm.de) by the Federal Ministry of Economics. The project’s objective was to develop solutions which allow media and radio archives to not only optimally digitise, comprehensively index and efficiently administer their growing inventory of audiovisual documents on German history, but also to make them accessible online. The project partner, the Hasso-Plattner Institut for software system technology GmbH, incorporated the development of automated and semantic media analysis and metadata generation, as well as semantic search technologies.
Visual index (of contents)
The index (of contents) is a software module for video structure analysis which, on the basis of structure recognition, divides the AV medium into scenes and shots and makes navigation within the video itself possible. Amongst others the Fraunhofer Heinrich Hertz Institute has developed a patented software module for segmentation which enables rapid and accurate detection of video sequences (www.shotdetection.de).
Navigation via voice-over
The option to have the voice-over of a video as text, whereby it is possible to navigate in the video and select specific points, offers real added value to video searches. The quality of the results is, however, dependent upon the quality of the speaker; dialects, background noises and voice overlaps can be problematic. Furthermore, complex domain training is needed in order to recognise the technical terms.
Science Cinema (www.osti.gov/sciencecinema) is a video portal created by OSTI (the Office of Scientific and Technical Information) in the US Department of Energy (DOE), together with Microsoft Research, which has developed the image of the voice-over on the basis of automated speech recognition. In addition, the desired search term is highlighted in the audio snippets. Voxalead News (http://voxaleadnews.labs.exalead.com/) also searches the spoken content of radio and television programmes, thus enabling innovative navigation within the video.
On the basis of the evaluated results from the expert interviews and the context analysis, the required functions were integrated into the first prototype. On those points where there was a lack of consensus between the experts, a decision was made for the consideration thereof in the prototype. These functions were given special attention in the users’ focus groups.
5. Rapid Prototyping
In the AV-portal project, initially paper prototypes were created upon which the basic layout of the AV-portal was set out and discussed with the project participants. Building upon the paper prototypes, the next step involved the development of wireframes for the most important pages of the AV-portal, e.g. homepage, search result page, provision of videos, video upload, etc.
Using the wire frames, detailed questions about the functions were discussed internally and iteratively optimised, such as search query, result presentation, construction of the search filter, uploading of the AV medium, etc. For this, the wireframes were made centrally available on the internet, so that all the project participants had access and were then able to provide feedback to the conception team. As in an AV-portal, interaction with the innovative processes of a visual search and visualisation is a central aspect for the question of usage; another step involved the development of a Clickdummy on the basis of a wire frame. In this Clickdummy, links were clearly highlighted and first representations for real data sets were integrated, so that processes such as the search for a medium or the uploading of a film could be deposited. The Clickdummy was used for the users’ evaluation in the focus groups.
Integrated into the first prototype were the following recommendations which had been made by the experts:
- Linking of the media with the context, e.g. full-texts or research data
- Option of a cross-media search
- Visual index of contents which, on the basis of structure recognition of the AV medium, divides shots and scenes and allows precise navigation within the objects
- Faceted search
- Visualisation of the Voice-over on the basis of automated speech recognition to locate a specific image sequence
- Referencing of the AV media by means of a citing link (DOI).
- Integration of community functions, such as user profiles, tagging, evaluations, recommendations, etc.
6. Focus Groups
A focus group is a moderated group discussion on a particular subject. Using this method, it is possible to hear ideas from the user group about design variants or content, but also to get any thoughts or concerns and the acceptance criteria. Focus groups are typically used to introduce prototypes of a system to future users and to get early feedback about the system. An experienced moderator ensures a constructive and result-orientated approach.
In the AV-portal project, the evaluation with the users ensured that their requirements of the portal were taken into account and implemented early on. There were two focus groups from the fields of physics and mechanical engineering with a total of 15 participants. Analogous to the TIB’s target groups, participants from industry, research and teaching were recruited and the composition of these groups was as follows; research staff (3), doctoral students (2), LfbA (1), academic senior councillor/senior assistant professor (1), students (3), technical assistant (1), graduate librarian (1), engineer (1), not stated (2).
First of all, the focus group participants were asked about their experiences and habits in relation to AV media by means of a structured questionnaire:
Result: Physics focus group
All of the focus group participants from the physics group use the internet many times a day. Apart from two people, all the test participants use AV media frequently to very frequently in their daily work. Five of the participants stated that part of their work actually involves the production of AV media, predominantly simulations (four mentions). Furthermore, visualisations, videos, animation and other graphical presentations are produced. AV media are obtained through the classical channels. The most important source, however, is through direct, personal exchanges with colleagues or referring to the homepages of other working groups (all participants). Four participants use the supply option straight from publishers or via licensed software modules and their user communities. One other person stated that the use of these channels was planned for the future.
Individually (two mentions), AV media are obtained from web portals like YouTube and Wikimedia. Social networks and Web 2.0 channels like blogs or fora are not used by any of the participants to obtain AV media at the moment. All the participants who do produce AV media exchange results internally with their colleagues. Apart from this, four participants use their working group’s intranet in order to pass on the AV media produced. Three people also publish material on the homepages of other working groups and four of the focus group participants publish their media in the classical way, i.e. via publishers. Just as with usage, social networks and Web 2.0 channels have no part to play in this publishing. Only one of the participants puts the results of his work up on YouTube which can then be downloaded.
Result: Mechanical engineering focus group
All but one of the participants in the mechanical engineering focus group use the internet several times a day. For the majority of the focus group participants, AV media are used frequently to very frequently in their daily work. Five people produce AV media themselves and these are predominantly videos (all participants) of experiments, software or machine processes. Some of these videos are embedded into PowerPoint presentations. The group of students stated that they do not produce AV media. For the mechanical engineers, the most important supply channel for AV media is also through direct, personal exchange with colleagues and fellow students (six mentions), just as it is for the physics focus group.
Licensed software modules, publishers or homepages of other working groups do not play a part in the supply. Students primarily get their AV media from teaching and study platforms like Stud.IP and Elias. In contrast to the rest of the focus groups, they also use social networks (StudiVZ) or video platforms like YouTube. Furthermore, they exchange AV media on fora. As with the supply of AV media, direct and personal contact is the most important channel for publishing. AV media are thus sent, e.g. by email, to colleagues (five mentions). Other than that, only a few other publication channels were chosen. Two participants share their materials with a working group over an intranet, but only one person publishes on teaching and study platforms or with publishers.
The following subjects were the foci of the group discussions:
- Participants‘ general assessment of the benefits of an AV-portal
- Homepage of the AV-portal (Clickdummy)
- Search options: Search with text, experts’ search
- Detailed view of a found medium
- Web 2.0 features (particularly tagging, user profiles, recommendations, evaluations)
Selected results of the focus groups:
All the participants would like to use the planned AV-portal
Overall, the prototype of the AV portal received very positive feedback. The 15 participants from different target groups (i.e. students, research staff, professors, lecturers) all saw the portal as being of great added value for their scientific work.
Quantity and quality of the content are decisive for acceptance
The users named two important success criteria; quantity (a sufficiently large basic inventory of AV media) and quality (high-quality/serious/citable videos must be differentiated from those of lesser quality). A critical quantity of AV media must be available. For scientists to become active and start using the portal, it was important that, right from the start, a large quantity of data had already been integrated. One participant put it like this: “The motivation to undertake a pioneering task is limited.”
Unique citability of videos is a great incentive
The opportunity to obtain a DOI for a submitted video is a great incentive for use.
Getting information and browsing are currently not supported
Criticism was directed at the fact that, up until now, primarily a search scenario has been supported whereby the user knows precisely what he/she is searching for. However, it would be ideal to first have the option of browsing to see what content and functionalities the portal offers. However, a student who wants to access basic information about his/her subject, and does not know a great deal about the relevant themes/subject areas/names, is not offered enough support with direct search access.
Channels are of real added value, especially for students
The student participants in the focus group voiced their wish to be able to have access to relevant videos by means of so-called channels, just as they are used to from YouTube. These channels (e.g. an “E-technology lecture”) could be subscribed to and would offer a thematic or target group orientated view of an excerpt of AV media in the portal. The channels could still be subscribed to via RSS-feeds.
Visual index (of contents) is seen to be useful
The users did not realise that this concerned automatically recognised film editing (who determines the film edits?). They could, however, envisage application scenarios in a scientific context. As an example, lecture recordings would always be made using two cameras and, with cameras moving between the professor and the experiment, film editing could provide a useful way of navigating in the video. The video segmenting and provision of thumbnails for orientation as to which part of the video could be relevant was, therefore, rated very highly.
Manual segmenting would also be of interest
As another possibility, the participants discussed whether a facility for chapter division should be provided. In this way, the individual who is uploading can decide for him/herself how the film would be best divided into chapters.
Information about the size of the file in the results list is missed
In the Clickdummy, the users missed the option to be able to sort and filter the list of results by file size.
Navigation capabilities via audio-text within a video is highly rated
The option of a search or navigation in the video via the audio-text featured next to it was largely unknown to the users, but was felt to be very interesting and helpful.
Faceted search was positively evaluated
The construction of a filter was welcomed by all the participants. In particular, the possibility of collapsing irrelevant categories in order to have more space for the relevant ones was positively evaluated. Language options should not be housed in the filter though, rather the search itself should be able to be carried out in many languages. To assess the content quality of the videos, the participants would like to see the option to filter the results list according to origin, i.e. by institution. The participants also missed having a way of filtering the search results by availability of the video/s concerned, e.g. “only show me videos which are freely available” or “hide videos which are subject to a charge.”
Video preview image/thumbnail too small
The participants were unanimous that the thumbnail/preview image in the detail view of the videos is too small.
User-generated content/Web 2.0
The subject of tagging was discussed in detail, such as who should be allowed to assign keywords to a video, i.e. the provider of the video or any user of the AV-portal. Finally, the users came to the consensus that both should be allowed, thus a user would be allowed to add an important keyword to a video which might have been overlooked. However, it was also discussed that keywords assigned by the author or the provider of the video should be able to be distinguished from other keywords. In addition, when it comes to tagging, the necessity of a control mechanism was mentioned once again: “User tags are important, but should not be assigned unfiltered and uncontrolled.” Both the participants and the experts from the expert interviews were somewhat sceptical about the purely community features (profiles, networking with others, etc.)
Only limited online editing desired
The participants discussed the option of being able to edit videos online directly on the AV-portal, but came to the conclusion that it would be sufficient to just be able to choose a scene from a video which they could then download as an excerpt, thus ensuring that they would not have to store the full quantity of data contained in a long video on their computers. A full cutting tool for the new combining of scenes is not required.
Building on these results, a variety of optimisations have been made for the prototype, e.g.:
- The integration of additional access options on the home page (e.g. browse, access by subjects/target groups/…)
- The integration of channels
- The expansion of the thumbnail/preview image of the video
- More differentiated presentation of the evaluations (evaluation of the film quality, scientific quality, etc)
- The option to segment own videos oneself (division into chapters)
Wireframe AV-Portal Homepage and search result page
7. The further development of the AV-portal
Based on the results of the requirements analysis, the TIB, together with the Hasso-Plattner Institut , has been working on the further development of a web-based platform for audiovisual media since mid-2011.
HPI and TIB are developing new, or are converting known, multimedia analysis methods such as form, speech or structure recognition in order to enhance metadata. As a result, the portal shall provide optimal access to video content. The information extracted via the media analysis is correlated using semantic information so that users can identify new links and can browse the multimedia content as they like. Amongst the tools which support the video search is, for example, a visual index as well as navigation via visualised audio-text by means of automatic voice recognition. Furthermore a semantic explorer visualises connections and links between the videos. These tools are designed to make searching for, and in, audiovisual media as easy as it already is for textual information. The audiovisual media will be linked, via the TIB portal’s GetInfo (GetInfo is the portal for science and technology with integrated full text delivery.), with other research information such as digital full texts, numerical data and facts as well as research primary data and will be clearly referenced by the allocation of a Digital Object Identifier Name (DOI).
The portal will have the following features:
- Customer-friendly user interface
- Powerful textual search functions (simple search, advanced search)
- Intelligent Character Recognition (ICR)
- Semantic search
- Faceted search
- Visual index (of contents) on the basis of shot/scene detection
- Classification on the basis of genre detection
- Navigation via audio text on the basis of speech recognition
- Channels (subjects, institutes, etc) analogous to YouTube
- Easy uploading of own videos, including the granting of right of use
- User-generated Web 2.0 services, such as tagging and rating/evaluation
- Referencing of the AV media by means of a citing link (DOI).
In 2011, a partially functioning prototype of the AV-portal was developed; in 2012-2013, the further development and the beta operation of a system will follow and, for 2014, the full operation of the portal is planned. In order to ensure optimum usability of the AV-portal, the development will also continue to be ensured by the application of user-centred methods, indeed three usability tests are scheduled to take place during the remaining project period.
This blog post is the write-up of the author’s presentation at the 2012 LIBER Annual General Conference in Tartu.