Difference between revisions of "Dashboard"

From Archivematica
Jump to navigation Jump to search
 
(52 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Dashboard
 
[[Main Page]] > [[Development]] > [[:Category:Development documentation|Development documentation]] > Dashboard
 +
 +
<div style="padding: 10px 10px; border: 1px solid black; background-color: #F79086;">This page is no longer being maintained and may contain inaccurate information. Please see the [https://www.archivematica.org/docs/latest/ Archivematica documentation] for up-to-date information. </div> <p>
  
 
<div class="status">
 
<div class="status">
 
<div>
 
<div>
 
Design
 
Design
<div class="description">
+
<div class="description">This page proposes a new feature and reviews design options</div>
This page proposes a new feature and reviews design options
 
 
</div>
 
</div>
</div><div class="active">
+
<div>
 
Development
 
Development
<div class="description">
+
<div class="description">This page describes a feature that's in development</div>
This page describes a feature that's in development
 
 
</div>
 
</div>
</div><div>
+
<div class="active">
 
Documentation
 
Documentation
<div class="description">
+
<div class="description">This page documents an implemented feature</div>
This page documents an implemented feature
 
</div>
 
 
</div>
 
</div>
 
</div>
 
</div>
  
=Technical Requirements=
+
The dashboard manages a pipeline’s behaviour. It provides a view into the status of units, allows workflow decisions to be made, handles configuration, allows arrangement & description, and customization of the [[Format policy registry requirements | FPR]].  Code for the dashboard is found in <code>src/dashboard/src/</code>
 +
 
 +
Previously the dashboard also handled starting transfers from disk & storing files, but much of that functionality has been moved to the storage service.
 +
 
 +
== Sections ==
 +
 
 +
 
 +
* <code>components/access</code>: Internal API to talk to access systems (ArchivesSpace, Archivist’s Toolkit). Primarily supports the appraisal tab, and uses [[https://github.com/artefactual-labs/agentarchives agentarchives]]
 +
* <code>components/accounts</code>: Views related to account management (create, edit, delete, list)
 +
* <code>components/administration</code>: Views & models related to the Administration tab.  Contains the processing config form, DIP upload settings, etc
 +
* <code>components/api</code>: [[ Archivematica_API | External API]] endpoints.
 +
* <code>components/appraisal</code>: Appraisal tab view.
 +
* <code>components/archival_storage</code>: Views related to the Archival Storage tab.  Contains ElasticSearch queries, creating AICs, deleting AIPs and starting reingest.
 +
* <code>components/backlog</code>: Views related to the Backlog tab. Contains ElasticSearch queries, deleting & downloading backlogged transfers.
 +
* <code>components/file</code>: Internal API to get file information from ElasticSearch & Storage Service. Primarily supports the appraisal tab.
 +
* <code>components/filesystem_ajax</code>: Internal API to support SIP Arrangement and moving files to and from the pipeline. Much of its previous functionality has been superseded by the storage service
 +
* <code>components/ingest</code>: Views and internal API related to the Ingest tab.
 +
* <code>components/mcp</code>: Internal API for status updates from [[MCPServer]]
 +
* <code>components/rights</code>: Views and forms for editing unit rights.
 +
* <code>components/transfer</code>: Views and internal API related to the Transfer tab.
 +
* <code>components/unit</code>: Views and internal API common between Transfers and SIPs
 +
{| class="wikitable" style="background-color:#ffeecc;" cellpadding="10";
 +
| Improvement Note: This is newer than <code>components/transfer</code> and <code>components/ingest</code>. Functionality that is the same between the two should be moved here as appropriate
 +
|}
 +
* <code>external</code>: External dependencies, often git submodules. '''Note: as of Archivematica 1.7, there is no external/ directory here. The base64-helpers subdirectory is now required by the transfer-browser and appraisal-tab and is downloaded during the NPM build phase.'''
 +
* <code>fpr</code>: Views related to the Preservation Planning tab. This in the FPR-admin submodule from externals, and contains all [[Format policy registry requirements | FPR]] modification views. '''Note: as of Archivematica 1.7, there is no fpr/ directory here. The archivematica-fpr-admin module is now a dependency of the dashboard; see dashboard/src/requirements/base.txt.'''
 +
* <code>installer</code>: Views related to installation and setting up Archivematica
 +
* <code>main</code>: Models, views and internal APIs related to core dashboard functionality. Contains all model definitions & migrations, as well as app-wide configuration. Also includes Task & Job display, and views related to the Access tab.
 +
* <code>media</code>: All CSS, JS & images for the dashboard
 +
* <code>middleware</code>: Django middleware definitions.
 +
* <code>requirements</code>: Python requirements files
 +
* <code>settings</code>: Django settings modules for development or production configuration
 +
* <code>templates</code>: All templates for the dashboard. Structure broadly mirrors that of the components directory.
 +
 
 +
== Old documentation ==
 +
 
 +
===Technical Requirements===
 
The Dashboard is a web-based tool that is developed using Python-based [http://djangoproject.com Django] MVC framework.
 
The Dashboard is a web-based tool that is developed using Python-based [http://djangoproject.com Django] MVC framework.
  
=Functional Requirements=
+
===Functional Requirements===
 +
 
 +
*provide a web-based, multi-user interface that will report on the status of system events and make it simpler to control and trigger specific micro-services.
 +
*provide a user-friendly interface to add/edit metadata
 +
*coordinate the read and write operations of the AIP to file storage and the syncing of metadata updates between the AIPs and the access system.
 +
*process Consumer AIP requests
 +
*provide statistical information about Archivematica operation
 +
*provide preservation planning information
 +
 
 +
 
 +
===User interface===
  
==Release 0.7-alpha==
+
Release 0.7-alpha (Feb 18, 2011)
1) Provide updates on the Archivematica processes by reading rows from the MCP 'Task' table in its MySQL dbase.
 
*This will likely have to happen through some kind of polling by the Django app of the MySQL database.
 
*One other implementation option we've discussed is having Archivematica publish a RSS/Atom feed that the Django app reads.
 
  
2) Interact with the the Archivematica API:
+
[[File:Dashboard-0.7.png|680px]]
#getListOfJobsAwaitingApproval
 
#approveJob
 
  
* i.e. at certain stages in the Archivematica workflow we will stop and await the explicit approval from an archivist to trigger the next series of Archivematica tasks. So somewhere in the Django Dashboard there will be a list of jobs awaiting approval (retrieved from the Archivematica API) and then the ability for an archivist to click a button thereby approving a job and notifiying Archivematica MCP of that action (again, via the Archivematica API).
+
Django interface found in Archivematica 0.6.2 (dev tree, 29 Nov 2010)
  
==Post release 0.7-alpha==
+
[[File:Archivematica-dashboard-0.6.2-dev-29Nov.png|480px]]
*create Accession Record & add appraisal metadata via Dashboard
 
*API:
 
**listActiveTasks
 
**taskCompleteNotification
 
  
=User interface=
 
 
Early mockup (March 2010)
 
Early mockup (March 2010)
  
[[File:ArchivematicaDashboardScreencap05Mar2010.png]]
+
[[File:ArchivematicaDashboardScreencap05Mar2010.png|480px]]
 +
 
 +
===Real-time interaction===
 +
 
 +
<div style="background-color: #fffccc; border: 1px solid #ddd; padding: 8px;">
 +
Our preliminary design will be based in '''periodic refresh''', trying to minimize the risks of more sophisticated solutions before [http://archivematica.org/wiki/index.php?title=Development_roadmap#Release_0.7-alpha Release 0.7-alpha] is launched. In future releases, we will do more research on this topic trying to achieve the best user experience while we keep an eye on performance.</div>
 +
 
 +
The Ajax web application model came to made the Web UI experience dynamic and asynchronous, as a replacement of the classic page-by-page web application model (see [http://www.adaptivepath.com/images/publications/essays/ajax-fig2.png graph]). However, Ajax applications don't offer a duplex communication where both client and server can send messages at any time. A new model of web applications frequently called [http://infrequently.org/2006/03/comet-low-latency-data-for-the-browser/ Comet] appeared providing bi-directional communications using persistent long-lasting HTTP connections between the server and the client (see [http://infrequently.org/wp-content/Comet.png graph]). Comet is similar to Ajax in that it's asynchronous, but '''applications following the Comet model can communicate state changes on the server with almost negligible latency, which makes it suitable for monitoring or multi-user collaboration applications'''.
 +
 
 +
There exists different methods of implementing a Comet streaming transport (browser transport), but all of them are based in existing browser features: iframe HTML element, XMLHttpRequest or script tags. Between these methods, I think that we have two candidates:
 +
 
 +
* '''XMLHttpRequest long polling''': firstly, the browser creates an asynchronous XMLHttpRequest with a long time-out. When we receive a response, the server closes the connection and we launch another XHR request immediately afterward, waiting for a new event.
 +
* '''Script tag long polling''': the browser creates script HTML elements dinamically and setting their source ("src" attribute) to the location of the server, which then send back JavaScript code. Each time the script requests is completed, the browser opens a new one, just like XHR long polling design does. This method bypass the [http://code.google.com/p/browsersec/wiki/Part2#Same-origin_policy same-origin policy security mechanism] implemented in modern browsers.
 +
 
 +
====Alternatives====
 +
 
 +
There exists other alternatives that we should consider:
 +
 
 +
* '''WebSockets''': this technology is part of HTML5 and provides full-duplex communications channels over a single TCP socket between the browser and the server. The [http://dev.w3.org/html5/websockets/ WebSocket API] is being standardized by the W3C and the [http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol WebSocket protocol] is being standardized by the IETF ([https://datatracker.ietf.org/wg/hybi/charter/ HyBi working group]). Chrome 4, Safari 5, Firefox 4 (not yet in FF3) and Opera 11 support WebSockets. However, the last two ones have disabled this protocol by default. HTML5 Labs at Microsoft interoperability group recently launched a [http://blogs.msdn.com/b/interoperability/archive/2010/12/21/introducing-the-websockets-prototype.aspx prototype] compatible with IE8 and IE9 based in Silverlight. WebSockets is a promissing technology but unfortunately in the development phase yet.
 +
** There are some solutions which provide an API that looks like WebSocket API, and fallback to other techniques if WebSocket is not available. A good example is Socket.IO, which supports different transports: WebSocket, Adobe® Flash® Socket, Ajax long polling, etc... however, the server module was designed for Node.JS. Several implementations have been started for other languages / frameworks that are compatible with the Socket.IO client.
 +
** Other products like CometD, Lightstreamer and others provide a higher-level API using pubsub (see mod_pubsub) or some other messaging protocol, and use WebSocket or whatever other transport is available that is the fastest and safest option.
 +
* '''Server-Sent Events''': another [http://dev.w3.org/html5/eventsource/ draft API] included in HTML5 designed for scenarios where data does not need to be sent from the client, just need updates from the server (server push only). This technology, only supported by some browsers like Chrome or Opera, could be considered as a formal and efficient alternative to Comet, but based in the same method: HTTP long-held requests. The big difference with WebSockets is, therefore, that it does not to implement a new protocol (it is based in HTTP) and it is not really full-duplex, although it could be simulated with parallel XHR requests.
 +
* '''Periodic refresh''' (simple polling): to keep users informated about changes occurring on the server we can make the browser generates requests periodically, at fixed intervals, to gain new information: for example, one call every five seconds. This is a valid approximation where the server push data if data latency is not a critical for users. A callback function would be responsible for updating the DOM according to the server's latest report and the browser script can do some monitoring and dynamically adjust the period of refreshes to minimize the workload (e.g.: to cease it when the system detects the user is no longer active, see this [http://ajaxpatterns.org/Heartbeat article]).
 +
** [https://github.com/brettstimmerman/lace Lace]: Old open source chat based in simple polling (periodic refresh).
 +
 
 +
====Server design and scalability====
 +
 
 +
When a web application creates bi-directional connections between the browser and the server, new server software is often required in order to scale well. Take into account that traditional web-based solutions would break down very quickly due to memory consumption and the excess overhead of framework for each HTTP (and possibly long-held) request made.
 +
 
 +
More research on this must be done if we decide to take advantage of Comet or WebSockets technologies. These are some initial notes:
 +
 
 +
* [http://httpd.apache.org/docs/2.2/mod/event.html Apache MPM event]: this experimental module included in Apache 2.2 has the potential to bring Twisted-esque funcionality within the Apache pipeline. It can save significant overhead in creating TCP connections, however, Apache traditionally keeps an entire child process/thread waiting for data from the client, which brings its own disadvantages. To solve this problem, this MPM uses a dedicated thread to handle both the listening sockets, and all sockets that are in a Keep Alive state.
 +
* [http://www.tornadoweb.org/ Tornado]: an open source version of the scalable, non-blocking web server and tools that power FriendFeed. It is ideal for real-time web services. It is not just a web server, it could be considered a real-time web framework. [http://lincolnloop.com/blog/2009/sep/15/using-django-inside-tornado-web-server/ It can serves] Django applications.
 +
* Twisted, eventlet, gevent, Tornado, Node.JS, greenlet, celery
  
 +
Some recipes:
  
 +
* [http://www.clemesha.org/blog/realtime-web-apps-python-django-orbited-twisted Django + Orbited + Twisted]: and an example, a realtime webapp: [https://github.com/clemesha/hotdot Hotdot].
 +
* [http://media.eflorenzano.com/dropbox/UsingDjangoInNonStandardWays.pdf Using Django in Non-Standard Ways]
 +
* [http://prg10001.blogspot.com/2009/09/simpler-long-polling-with-django-and.html Simple long polling with Django and gevent]
 +
* [http://ajaxian.com/archives/django-and-comet Django and Comet using Orbited]
 +
* [http://blog.gevent.org/2010/02/27/why-gevent/ Comparing gevent to eventlet]
 +
* [http://stackoverflow.com/questions/1824418/a-clean-lightweight-alternative-to-pythons-twisted A clean, lightweight alternative to Python's twisted?]
 +
* [http://stackoverflow.com/questions/4363899/making-moves-w-websockets-and-python-django-twisted Making moves w/ websockets and python/django(/twisted?)]
 +
* [http://www.saltycrane.com/blog/2010/05/quick-notes-trying-twisted-websocket-branch-example/ Quick notes on trying the Twisted websocket branch example]
  
 +
===Debug mode===
  
 +
By default, the dashboard runs in "production" mode. To diagnose application errors it is usually useful to run in debug mode. Debug mode will display error messages. If you want to enable it, please follow these instructions:
  
 +
# Go to [http://code.google.com/p/archivematica/source/browse/trunk#trunk%2Fsrc%2Fdashboard dashboard sources directory]
 +
# Open settings.py file with your preferred text editor
 +
# Find the following line <pre>Debug = False</pre>
 +
# Update the False flag to True <pre>Debug = True</pre>
 +
# Save the file
 +
# Restart Apache <pre>sudo /etc/init.d/apache2 restart</pre>
  
  
 
[[Category:Development documentation]]
 
[[Category:Development documentation]]

Latest revision as of 16:43, 11 February 2020

Main Page > Development > Development documentation > Dashboard

This page is no longer being maintained and may contain inaccurate information. Please see the Archivematica documentation for up-to-date information.

Design

This page proposes a new feature and reviews design options

Development

This page describes a feature that's in development

Documentation

This page documents an implemented feature

The dashboard manages a pipeline’s behaviour. It provides a view into the status of units, allows workflow decisions to be made, handles configuration, allows arrangement & description, and customization of the FPR. Code for the dashboard is found in src/dashboard/src/

Previously the dashboard also handled starting transfers from disk & storing files, but much of that functionality has been moved to the storage service.

Sections[edit]

  • components/access: Internal API to talk to access systems (ArchivesSpace, Archivist’s Toolkit). Primarily supports the appraisal tab, and uses [agentarchives]
  • components/accounts: Views related to account management (create, edit, delete, list)
  • components/administration: Views & models related to the Administration tab. Contains the processing config form, DIP upload settings, etc
  • components/api: External API endpoints.
  • components/appraisal: Appraisal tab view.
  • components/archival_storage: Views related to the Archival Storage tab. Contains ElasticSearch queries, creating AICs, deleting AIPs and starting reingest.
  • components/backlog: Views related to the Backlog tab. Contains ElasticSearch queries, deleting & downloading backlogged transfers.
  • components/file: Internal API to get file information from ElasticSearch & Storage Service. Primarily supports the appraisal tab.
  • components/filesystem_ajax: Internal API to support SIP Arrangement and moving files to and from the pipeline. Much of its previous functionality has been superseded by the storage service
  • components/ingest: Views and internal API related to the Ingest tab.
  • components/mcp: Internal API for status updates from MCPServer
  • components/rights: Views and forms for editing unit rights.
  • components/transfer: Views and internal API related to the Transfer tab.
  • components/unit: Views and internal API common between Transfers and SIPs
Improvement Note: This is newer than components/transfer and components/ingest. Functionality that is the same between the two should be moved here as appropriate
  • external: External dependencies, often git submodules. Note: as of Archivematica 1.7, there is no external/ directory here. The base64-helpers subdirectory is now required by the transfer-browser and appraisal-tab and is downloaded during the NPM build phase.
  • fpr: Views related to the Preservation Planning tab. This in the FPR-admin submodule from externals, and contains all FPR modification views. Note: as of Archivematica 1.7, there is no fpr/ directory here. The archivematica-fpr-admin module is now a dependency of the dashboard; see dashboard/src/requirements/base.txt.
  • installer: Views related to installation and setting up Archivematica
  • main: Models, views and internal APIs related to core dashboard functionality. Contains all model definitions & migrations, as well as app-wide configuration. Also includes Task & Job display, and views related to the Access tab.
  • media: All CSS, JS & images for the dashboard
  • middleware: Django middleware definitions.
  • requirements: Python requirements files
  • settings: Django settings modules for development or production configuration
  • templates: All templates for the dashboard. Structure broadly mirrors that of the components directory.

Old documentation[edit]

Technical Requirements[edit]

The Dashboard is a web-based tool that is developed using Python-based Django MVC framework.

Functional Requirements[edit]

  • provide a web-based, multi-user interface that will report on the status of system events and make it simpler to control and trigger specific micro-services.
  • provide a user-friendly interface to add/edit metadata
  • coordinate the read and write operations of the AIP to file storage and the syncing of metadata updates between the AIPs and the access system.
  • process Consumer AIP requests
  • provide statistical information about Archivematica operation
  • provide preservation planning information


User interface[edit]

Release 0.7-alpha (Feb 18, 2011)

Dashboard-0.7.png

Django interface found in Archivematica 0.6.2 (dev tree, 29 Nov 2010)

Archivematica-dashboard-0.6.2-dev-29Nov.png

Early mockup (March 2010)

ArchivematicaDashboardScreencap05Mar2010.png

Real-time interaction[edit]

Our preliminary design will be based in periodic refresh, trying to minimize the risks of more sophisticated solutions before Release 0.7-alpha is launched. In future releases, we will do more research on this topic trying to achieve the best user experience while we keep an eye on performance.

The Ajax web application model came to made the Web UI experience dynamic and asynchronous, as a replacement of the classic page-by-page web application model (see graph). However, Ajax applications don't offer a duplex communication where both client and server can send messages at any time. A new model of web applications frequently called Comet appeared providing bi-directional communications using persistent long-lasting HTTP connections between the server and the client (see graph). Comet is similar to Ajax in that it's asynchronous, but applications following the Comet model can communicate state changes on the server with almost negligible latency, which makes it suitable for monitoring or multi-user collaboration applications.

There exists different methods of implementing a Comet streaming transport (browser transport), but all of them are based in existing browser features: iframe HTML element, XMLHttpRequest or script tags. Between these methods, I think that we have two candidates:

  • XMLHttpRequest long polling: firstly, the browser creates an asynchronous XMLHttpRequest with a long time-out. When we receive a response, the server closes the connection and we launch another XHR request immediately afterward, waiting for a new event.
  • Script tag long polling: the browser creates script HTML elements dinamically and setting their source ("src" attribute) to the location of the server, which then send back JavaScript code. Each time the script requests is completed, the browser opens a new one, just like XHR long polling design does. This method bypass the same-origin policy security mechanism implemented in modern browsers.

Alternatives[edit]

There exists other alternatives that we should consider:

  • WebSockets: this technology is part of HTML5 and provides full-duplex communications channels over a single TCP socket between the browser and the server. The WebSocket API is being standardized by the W3C and the WebSocket protocol is being standardized by the IETF (HyBi working group). Chrome 4, Safari 5, Firefox 4 (not yet in FF3) and Opera 11 support WebSockets. However, the last two ones have disabled this protocol by default. HTML5 Labs at Microsoft interoperability group recently launched a prototype compatible with IE8 and IE9 based in Silverlight. WebSockets is a promissing technology but unfortunately in the development phase yet.
    • There are some solutions which provide an API that looks like WebSocket API, and fallback to other techniques if WebSocket is not available. A good example is Socket.IO, which supports different transports: WebSocket, Adobe® Flash® Socket, Ajax long polling, etc... however, the server module was designed for Node.JS. Several implementations have been started for other languages / frameworks that are compatible with the Socket.IO client.
    • Other products like CometD, Lightstreamer and others provide a higher-level API using pubsub (see mod_pubsub) or some other messaging protocol, and use WebSocket or whatever other transport is available that is the fastest and safest option.
  • Server-Sent Events: another draft API included in HTML5 designed for scenarios where data does not need to be sent from the client, just need updates from the server (server push only). This technology, only supported by some browsers like Chrome or Opera, could be considered as a formal and efficient alternative to Comet, but based in the same method: HTTP long-held requests. The big difference with WebSockets is, therefore, that it does not to implement a new protocol (it is based in HTTP) and it is not really full-duplex, although it could be simulated with parallel XHR requests.
  • Periodic refresh (simple polling): to keep users informated about changes occurring on the server we can make the browser generates requests periodically, at fixed intervals, to gain new information: for example, one call every five seconds. This is a valid approximation where the server push data if data latency is not a critical for users. A callback function would be responsible for updating the DOM according to the server's latest report and the browser script can do some monitoring and dynamically adjust the period of refreshes to minimize the workload (e.g.: to cease it when the system detects the user is no longer active, see this article).
    • Lace: Old open source chat based in simple polling (periodic refresh).

Server design and scalability[edit]

When a web application creates bi-directional connections between the browser and the server, new server software is often required in order to scale well. Take into account that traditional web-based solutions would break down very quickly due to memory consumption and the excess overhead of framework for each HTTP (and possibly long-held) request made.

More research on this must be done if we decide to take advantage of Comet or WebSockets technologies. These are some initial notes:

  • Apache MPM event: this experimental module included in Apache 2.2 has the potential to bring Twisted-esque funcionality within the Apache pipeline. It can save significant overhead in creating TCP connections, however, Apache traditionally keeps an entire child process/thread waiting for data from the client, which brings its own disadvantages. To solve this problem, this MPM uses a dedicated thread to handle both the listening sockets, and all sockets that are in a Keep Alive state.
  • Tornado: an open source version of the scalable, non-blocking web server and tools that power FriendFeed. It is ideal for real-time web services. It is not just a web server, it could be considered a real-time web framework. It can serves Django applications.
  • Twisted, eventlet, gevent, Tornado, Node.JS, greenlet, celery

Some recipes:

Debug mode[edit]

By default, the dashboard runs in "production" mode. To diagnose application errors it is usually useful to run in debug mode. Debug mode will display error messages. If you want to enable it, please follow these instructions:

  1. Go to dashboard sources directory
  2. Open settings.py file with your preferred text editor
  3. Find the following line
    Debug = False
  4. Update the False flag to True
    Debug = True
  5. Save the file
  6. Restart Apache
    sudo /etc/init.d/apache2 restart