Logging

dc42

I have an early build of RRF 1.20 that supports logging of events to SD card. I intend to make it available later this week. Before I finalise the specification, I'd like some input:

Should logging be enabled automatically, or by a command in config.g? _- What should be logged? Currently it logs error messages, warning messages, print start/end (real or simulated), and calibration results.
How should RRF manage log files, if at all?
What format should the log file have? Here is a sample log file generated by the current alpha firmware:

power up + 00:00:01 Starting up
power up + 00:00:02 Warning: Heater 0 appears to be over-powered. If left on at full power, its temperature is predicted to reach 257C.
power up + 00:00:02 Warning: Heater 1 appears to be over-powered. If left on at full power, its temperature is predicted to reach 555C.
power up + 00:00:02 WiFi module started
power up + 00:00:09 Wifi module is connected to access point PLUSNET-Q628, IP address 192.168.1.91
2017-09-20 08:14:15 HTTP client 192.168.1.94 login succeeded
2017-09-20 08:28:32 Error: GCode file "/DuetWiFiEnc.gcode" not found
2017-09-20 08:28:53 Started simulating printing file DuetWiFiEnc.gcode
2017-09-20 08:32:31 File DuetWiFiEnc.gcode will print in 3h 30m plus heating time
2017-09-20 08:42:45 Calibrated 8 factors using 16 points, deviation before 0.155 after 0.029
2017-09-20 08:42:57 Started printing file DuetWiFiEnc.gcode
2017-09-20 12:13:17 Finished printing file DuetWiFiEnc.gcode, print time was 3h 30m
2017-09-20 12:32:29 Error: GCode file "RobotdiggFanDuctWithIR.gcode" not found
2017-09-20 12:33:42 Started simulating printing file RobotdiggFanDuctWithIR.gcode
2017-09-20 12:34:35 File RobotdiggFanDuctWithIR.gcode will print in 1h 10m plus heating time
2017-09-20 13:54:19 Calibrated 8 factors using 16 points, deviation before 0.082 after 0.028
2017-09-20 13:54:59 Started printing file RobotdiggFanDuctWithIR.gcode
2017-09-20 15:05:32 Finished printing file RobotdiggFanDuctWithIR.gcode, print time was 1h 11m
power up + 00:00:01 Starting up
power up + 00:00:02 Warning: Heater 0 appears to be over-powered. If left on at full power, its temperature is predicted to reach 257C.
power up + 00:00:02 Warning: Heater 1 appears to be over-powered. If left on at full power, its temperature is predicted to reach 555C.
power up + 00:00:02 WiFi module started
power up + 00:00:09 Wifi module is connected to access point PLUSNET-Q628, IP address 192.168.1.91
2017-09-20 16:42:34 HTTP client 192.168.1.94 login succeeded
2017-09-20 16:44:03 Calibrated 8 factors using 16 points, deviation before 0.114 after 0.023
2017-09-20 16:44:15 Started printing file WheelHalves.gcode

```_

deckingman

Oh definitely NOT automatic for me. Please give me the ability to disable it if I want to. I'd like to see is a running total of print hours so that I can have a maintenance plan of some sort, if this is within the scope of what a log file can do. I guess logging of error messages would be useful in case a print went haywire but I wouldn't want log file of every print cluttering up the system. Just the one file that gets overridden would work for me. I appreciate that other users will have different views/requirements so this is just my own personal preference.

KeeganB

1. I like the ability to enable/disable via config.g
2.

Maybe include some of the data that is shown in M122, so that it can be "saved" in the log after each print?
Currently, I would love any log info that could relate to wifi issues as this has been a pain for me on 1.19.2.
Total print time for the machine (board) would be a neat.
Filament length used? and/or remaining?
3. It would be nice if it could keep more than one log, but maybe only up to 10 logs with the system automatically overwriting the oldest log file each time. Or something similar to that.
4. That format looks fine? I don't have anything to compare it to, so I can't say.

T3P3Tony

If you go for more than one log file then a "totals" file that simply kept the total hours run, filament used etc would be useful.

A log file per unit time, which is periodically auto archived is also a good idea. e.g.:

/logs/YYYY_MM_DD-current_log
/logs/archive/YYYY_MM/YYYY_MM_DD-archive_log

I agree it should be configurable with config.g, arguably though new users should have logging on by default as it may make it easier to diagnose problems.

Thought should be given to the best character to use to delineate between the columns in the log. nice if its easy to import into external software (e.g. excel) to analyse if required. Any character that would not otherwise be in the log would work, Tab and comma are obviously commonly used.

timcurtis67

I vote to enable/disable. But I do like the idea of logging errors and things along that line.

PlasticMetal

Format looks fine, definitely saved as a .csv (though doesn't have to be a comma). Slic3r allows gcode file name to be configured with several different placeholders, so user could choose e.g. date, gcode file name, time started/ended, etc.

I've been chasing fluctuating hot-end temperatures, and so the possibility of temperature logging might really help with troubleshooting.

I'm looking forward to this feature, however it's implemented!

dc42

@T3P3Tony:

If you go for more than one log file then a "totals" file that simply kept the total hours run, filament used etc would be useful.

A log file per unit time, which is periodically auto archived is also a good idea. e.g.:

/logs/YYYY_MM_DD-current_log
/logs/archive/YYYY_MM/YYYY_MM_DD-archive_log

I agree it should be configurable with config.g, arguably though new users should have logging on by default as it may make it easier to diagnose problems.

Thought should be given to the best character to use to delineate between the columns in the log. nice if its easy to import into external software (e.g. excel) to analyse if required. Any character that would not otherwise be in the log would work, Tab and comma are obviously commonly used.

A major problem with auto archiving is that the date and time are not always available. In particular, when logging is started (which is the obvious time to decide whether the old log file should be archived and a new one created), it generally isn't.

Tab should be usable as a field separator, although it may cause issues depending on what program is used to display the log. Comma might be problematic because an error message might contain one.

JohnOCFII

I'd love to see filename, date, print time and filament used per job.
Ideally, once the data is generated, the WebUI could have a History tab similar to this plug-in for OctoPrint:

https://github.com/imrahil/OctoPrint-PrintHistory

John

whosrdaddy

1. Make sure you can turn this feature on/off via GCode
2. It needs to be rocksolid: a log filling up the SD card should not ruin your ongoing print
3. Keep a general counter/statistics file somewhere (ie number of prints, filament used, etc…)
4. A suggestion (I am a developer by profession) : Since you don't have access to time/date (unless you find a way to include NTP sync into 1.20 ) I would implement some sort of rolling log.
There are several parameters for a rolling log:

limit size per logfile (to keep the loading of the logs snappy via the web GUI, for example 4Mb)
limit the number of logs (for example 10)
These limits should be configurable. This has the advantage that the SD card will only fill up space defined by those limits and RRF could provision this space (40Mb in this case), so this can be taken into account when a user tries to upload a gcode file when the SD card is nearly full.
A naming example :
current_log
1_old_log
2_old_log

if current_log would become full, all logs move one step down (ie current_log becomes 1_old_log, 1_old_log becomes 2_old_log and so on) and a new current_log will be written. This has the advantage that you know the order of the logs if you want to find something.
That being said, a log rolling scheme using date/time is far easier to implement

/Kris

fma

Logs are a good idea. But I vote to have them turned off by default, to avoid SD card writings if not necessary.

Having them enable/disabled on the fly would be great, to allow logging of just a part of a print, for exemple.

Do you plan to have configurable logs level (DEBUG/INFO/WARNING/ERROR/CRITICAL)?

WalterSKW

Logs :
High level log : keeps track of startup, printing a object …
kind of low level log for root-cause analysis. I would not opt for having different levels of logging in that one. All or nothing.

IMHO : size should be limited. One could switch between 2 files with a fixed max size and have an indicator to the active one.

Time :
Why not having the printer time automagically set from the DWC ? PC-clock -> printer clock ...

Starting and configuring with G-code (what else?) is a good idea. Size, max number of files...
Fixed location on SD card \logs\ ore something likely

Start/stop of the detailed logging from the DWC

DjDemonD

@dc42:

A major problem with auto archiving is that the date and time are not always available. In particular, when logging is started (which is the obvious time to decide whether the old log file should be archived and a new one created), it generally isn't.

Tab should be usable as a field separator, although it may cause issues depending on what program is used to display the log. Comma might be problematic because an error message might contain one.

Maybe each log entry could just be indexed with a number from 00000 onwards in the same way a digital camera numbers photographs taken, then if date and time are unavailable it will still be clear that print/session 00001 occurred before 00005 even if 00005 has a date/time field filled and 0001 does not.

Would love to see "global statistics" for the printer, run hours, filament used, etc… And for this to be able to be backed up so if I had to change duet card or upgrade I could continue the log.

The sample log entry looks good. Maybe dump an M122 to the end of it after each print, that wouldn't be too onerous, would it?

Definitely have the option to turn logging on/off and configure the level of detail logged, although I'm assuming you aren't too concerned the mighty duet will struggle to process/store this data even if it is verbose. But on by default on new units, as this would be a huge benefit in finding faults for new users.

T3P3Tony

@WalterSKW:

Logs :

Time :
Why not having the printer time automagically set from the DWC ? PC-clock -> printer clock …

People, including myself, use the Panel Due to run prints without connecting via DWC each time. This is especially the case when reprinting the same file a number of times over a number of days.

While time can't be guaranteed it should be used where possible.

olee

@T3P3Tony:

@WalterSKW:

Logs :

Time :
Why not having the printer time automagically set from the DWC ? PC-clock -> printer clock …

People, including myself, use the Panel Due to run prints without connecting via DWC each time. This is especially the case when reprinting the same file a number of times over a number of days.

While time can't be guaranteed it should be used where possible.

Maybe NTP might be an answer (printer is permanently connected to net).

dc42

I need a solution that works for everyone - even those few people who never use the web interface.

fma

A 3D printer is not really a security system as a firewall is. So, is it really important to have a lot of previous logs?

I would just do some simple log rotate based on number of files and size of each file. Both customisable with G-Codes (duet.log for current file, duet.log.1 for previous one, duet.log.2 for previous of previous…)

Then, for the date in the log, use real date if available (ntp), or date from Duet start on if not. Like:

[[language]]
2017-09-19 21:05:27::INFO::Start printing xxx
2017-09-19 21:05:29::LOG::my_var=3
...
2017-09-21 05:08:47::INFO::End of print xxx

or:

[[language]]
0000-00-00 00:00:00::INFO::Start printing xxx
0000-00-00 00:00:02::LOG::my_var=3
...
0000-00-01 08:03:20::INFO::End of print xxx

WalterSKW

@dc42:

I need a solution that works for everyone

Only one solution then : a RTC.
For NTP the printer has to reach out to the internet. I guess not everyone allows that.
Date/time setting could also be implemented as a function on the PanelDue for those not using DWC

Edit : Adding a small RTC board stacked on the PT100/Thermocouple board connector? (http://www.st.com/content/ccc/resource/technical/document/datasheet/group2/40/3f/91/e4/91/a1/47/7c/CD00129012/files/CD00129012.pdf/jcr:content/translations/en.CD00129012.pdf)

olee

But nowadays almost all consumer router have built-in NTP server so no need for WAN access for 3d printer. And I think that NTP would be allright for a big part of users.
Added: Who doesn't need DWC is no need for a precise time in log if nedd log at all. They can be provided with local time of Duet (time from board power up). Any decent investigation of a problem would require time stamp of events IMO.

Whitewolf

Possibly add Riemann logging ability… this way a farm of printers can send these streams to a remote system for logging and monitoring with dashboard alerts and metrics etc.

Riemann is an agentless monitoring system that would work perfectly for something like this and is recommended by and supported by Docker:
https://blog.docker.com/2016/03/realtime-cluster-monitoring-docker-swarm-riemann/

Why Riemann in particular?
There are a variety of reasons to use Riemann. Here are a few of the most obvious.
Riemann’s push-based model lets you detect problems and verify that fixes are working in near real-time.
Traditional monitoring systems such as Nagios usually work with a poll-based model: They wake up periodically (say, every 5 minutes), run a series of health checks to verify that everything in the system is in the desired state (e.g. your services are up), and report the results to you via e-mail or another method. Riemann requires that services send their own events, and consequently outages or network partitions can be detected as soon as the events stop flowing. Likewise, we can collect and process events of any type, opening the doors to proactively providing value through monitoring instead of simply fighting fires.
Riemann’s stream processing allows for powerful transformations of incoming data.
For instance, even if an exception is logged 1000 times in an hour, Riemann can roll up all exceptions of that type into just a few e-mails with information about how many times the exception occured. This helps combat alarm fatigue. Another example- Riemann makes measuring metrics in percentiles quite trivial. This allows to gain true insight into the operations of our systems without having important information masked behind aggregations such as averages.
Riemann is easy to get data into or out of, allowing interesting use cases such as storing event data for analysis later on.
Clients for Riemann can send events using a thin protocol buffers layer over TCP or UDP. Riemann can emit events and metrics from its index into a variety of backends including Graphite, InfluxDB, Librato, and more. Riemann also has built-in support for notification via SMS, e-mail, Slack, etc.
Riemann’s architectural model is simple, preventing the potentially ironic situation of having monitoring software that is difficult to operate and/or failing often.
Riemann is designed from the ground up to be straightforward to operate and reason about. While this has some tradeoffs (for instance, it makes it impossible to distribute a single Riemann instance safely), in practice Riemann’s philosophy is largely that imperfect information right now is better than perfect information which never arrives. If the Riemann server crashes, usually a simple restart will remedy the problem; no finicky reconciling of distributed state needs to occur.
Likewise, the push-based model helps to alleviate the “who monitors the monitoring software” question (i.e. how do we detect when our monitoring itself has issues). We can simply set up a few Riemann servers and forward events to the downstream servers. If the upstream server goes down, the downstream servers will notice this and alert you. Especially running Docker across clouds, this could be particularly valuable.
It’s rather fast.
From the landing page: “Throughput depends on what your streams do with events, but a stock Riemann config on commodity x86 hardware can handle millions of events per second at sub-ms latencies, with 99ths around 5ms. Riemann is fully parallel and leverages Clojure and JVM concurrency primitives throughout.”