From Archivematica
Revision as of 12:57, 18 February 2011 by Peter (talk | contribs)
Jump to navigation Jump to search

Main Page > Development > Development documentation > Dashboard


This page proposes a new feature and reviews design options


This page describes a feature that's in development


This page documents an implemented feature

Technical Requirements

The Dashboard is a web-based tool that is developed using Python-based Django MVC framework.

Functional Requirements

  • provide a web-based, multi-user interface that will report on the status of system events and make it simpler to control and trigger specific micro-services.
  • provide a user-friendly interface to add/edit metadata
  • coordinate the read and write operations of the AIP to file storage and the syncing of metadata updates between the AIPs and the access system.
  • process Consumer AIP requests
  • provide statistical information about Archivematica operation
  • provide preservation planning information

Release 0.7-alpha

1) Provide updates on the Archivematica processes by reading rows from the MCP 'Task' table in its MySQL dbase.

  • This will likely have to happen through some kind of polling by the Django app of the MySQL database.
  • One other implementation option we've discussed is having Archivematica publish a RSS/Atom feed that the Django app reads.

2) Provide notification of errors

  • Can deduce from values in the Tasks table whether an error output has occured
  • Will need to backtrack to MCP Task config XML file to identify the Error output directory where the files that incurred the error will be sitting

3) Interact with the the Archivematica API:

  1. getListOfJobsAwaitingApproval
  2. approveJob
  • i.e. at certain stages in the Archivematica workflow we will stop and await the explicit approval from an archivist to trigger the next series of Archivematica tasks. So somewhere in the Django Dashboard there will be a list of jobs awaiting approval (retrieved from the Archivematica API) and then the ability for an archivist to click a button thereby approving a job and notifiying Archivematica MCP of that action (again, via the Archivematica API).

Post release 0.7-alpha

  • create Accession Record & add appraisal metadata via Dashboard
  • API:
    • listActiveTasks
    • taskCompleteNotification
  • list/search AIP locations on archival storage (ideally a URI but will depend on chosen storage platform)
  • sync access system metadata updates with matching metadata elements in AIP's METS.xml
  • process AIP request
  • preservation monitoring
    • list format types in storage
    • link to media-type preservation plans (URI starting with wiki-page can evolve to RDF registry)
    • notify when preservation actions required for format types
  • solr index

User interface

Release 0.7-alpha (Feb 18, 2011)


Django interface found in Archivematica 0.6.2 (dev tree, 29 Nov 2010)


Early mockup (March 2010)


Real-time interaction

Our preliminary design will be based in periodic refresh, trying to minimize the risks of more sophisticated solutions before Release 0.7-alpha is launched. In future releases, we will do more research on this topic trying to achieve the best user experience while we keep an eye on performance.

The Ajax web application model came to made the Web UI experience dynamic and asynchronous, as a replacement of the classic page-by-page web application model (see graph). However, Ajax applications don't offer a duplex communication where both client and server can send messages at any time. A new model of web applications frequently called Comet appeared providing bi-directional communications using persistent long-lasting HTTP connections between the server and the client (see graph). Comet is similar to Ajax in that it's asynchronous, but applications following the Comet model can communicate state changes on the server with almost negligible latency, which makes it suitable for monitoring or multi-user collaboration applications.

There exists different methods of implementing a Comet streaming transport (browser transport), but all of them are based in existing browser features: iframe HTML element, XMLHttpRequest or script tags. Between these methods, I think that we have two candidates:

  • XMLHttpRequest long polling: firstly, the browser creates an asynchronous XMLHttpRequest with a long time-out. When we receive a response, the server closes the connection and we launch another XHR request immediately afterward, waiting for a new event.
  • Script tag long polling: the browser creates script HTML elements dinamically and setting their source ("src" attribute) to the location of the server, which then send back JavaScript code. Each time the script requests is completed, the browser opens a new one, just like XHR long polling design does. This method bypass the same-origin policy security mechanism implemented in modern browsers.


There exists other alternatives that we should consider:

  • WebSockets: this technology is part of HTML5 and provides full-duplex communications channels over a single TCP socket between the browser and the server. The WebSocket API is being standardized by the W3C and the WebSocket protocol is being standardized by the IETF (HyBi working group). Chrome 4, Safari 5, Firefox 4 (not yet in FF3) and Opera 11 support WebSockets. However, the last two ones have disabled this protocol by default. HTML5 Labs at Microsoft interoperability group recently launched a prototype compatible with IE8 and IE9 based in Silverlight. WebSockets is a promissing technology but unfortunately in the development phase yet.
    • There are some solutions which provide an API that looks like WebSocket API, and fallback to other techniques if WebSocket is not available. A good example is Socket.IO, which supports different transports: WebSocket, Adobe® Flash® Socket, Ajax long polling, etc... however, the server module was designed for Node.JS. Several implementations have been started for other languages / frameworks that are compatible with the Socket.IO client.
    • Other products like CometD, Lightstreamer and others provide a higher-level API using pubsub (see mod_pubsub) or some other messaging protocol, and use WebSocket or whatever other transport is available that is the fastest and safest option.
  • Server-Sent Events: another draft API included in HTML5 designed for scenarios where data does not need to be sent from the client, just need updates from the server (server push only). This technology, only supported by some browsers like Chrome or Opera, could be considered as a formal and efficient alternative to Comet, but based in the same method: HTTP long-held requests. The big difference with WebSockets is, therefore, that it does not to implement a new protocol (it is based in HTTP) and it is not really full-duplex, although it could be simulated with parallel XHR requests.
  • Periodic refresh (simple polling): to keep users informated about changes occurring on the server we can make the browser generates requests periodically, at fixed intervals, to gain new information: for example, one call every five seconds. This is a valid approximation where the server push data if data latency is not a critical for users. A callback function would be responsible for updating the DOM according to the server's latest report and the browser script can do some monitoring and dynamically adjust the period of refreshes to minimize the workload (e.g.: to cease it when the system detects the user is no longer active, see this article).
    • Lace: Old open source chat based in simple polling (periodic refresh).

Server design and scalability

When a web application creates bi-directional connections between the browser and the server, new server software is often required in order to scale well. Take into account that traditional web-based solutions would break down very quickly due to memory consumption and the excess overhead of framework for each HTTP (and possibly long-held) request made.

More research on this must be done if we decide to take advantage of Comet or WebSockets technologies. These are some initial notes:

  • Apache MPM event: this experimental module included in Apache 2.2 has the potential to bring Twisted-esque funcionality within the Apache pipeline. It can save significant overhead in creating TCP connections, however, Apache traditionally keeps an entire child process/thread waiting for data from the client, which brings its own disadvantages. To solve this problem, this MPM uses a dedicated thread to handle both the listening sockets, and all sockets that are in a Keep Alive state.
  • Tornado: an open source version of the scalable, non-blocking web server and tools that power FriendFeed. It is ideal for real-time web services. It is not just a web server, it could be considered a real-time web framework. It can serves Django applications.
  • Twisted, eventlet, gevent, Tornado, Node.JS, greenlet, celery

Some recipes:

Debug mode

By default, the dashboard runs in "production" mode. To diagnose application errors it is usually useful to run in debug mode. Debug mode will display error messages. If you want to enable it, please follow these instructions:

  1. Go to dashboard sources directory
  2. Open settings.py file with your preferred text editor
  3. Find the following line
    Debug = False
  4. Update the False flag to True
    Debug = True
  5. Save the file
  6. Restart Apache
    sudo /etc/init.d/apache2 restart