2254 Although EtherCAT's timing is highly deterministic and therefore timing issues |
2254 Although EtherCAT's timing is highly deterministic and therefore timing issues |
2255 are rare, there are a few aspects that can (and should be) dealt with. |
2255 are rare, there are a few aspects that can (and should be) dealt with. |
2256 |
2256 |
2257 %------------------------------------------------------------------------------ |
2257 %------------------------------------------------------------------------------ |
2258 |
2258 |
2259 \subsection{Application Interface Profiling} |
2259 \section{Application Interface Profiling} |
2260 \label{sec:timing-profile} |
2260 \label{sec:profiling} |
2261 \index{Profiling} |
2261 \index{Profiling} |
2262 % FIXME |
|
2263 |
2262 |
2264 One of the most important timing aspects are the execution times of the |
2263 One of the most important timing aspects are the execution times of the |
2265 application interface functions, that are called in cyclic context. These |
2264 application interface functions, that are called in cyclic context. These |
2266 functions make up an important part of the overall timing of the application. |
2265 functions make up an important part of the overall timing of the application. |
2267 To measure the timing of the functions, the following code was used: |
2266 To measure the timing of the functions, the below cyclic code was used: |
2268 |
2267 |
2269 \begin{lstlisting}[gobble=2,language=C] |
2268 \begin{lstlisting}[language=C] |
2270 c0 = get_cycles(); |
2269 c0 = get_cycles(); |
2271 ecrt_master_receive(master); |
2270 ecrt_master_receive(master); |
2272 c1 = get_cycles(); |
2271 c1 = get_cycles(); |
2273 ecrt_domain_process(domain1); |
2272 ecrt_domain_process(domain1); |
2274 c2 = get_cycles(); |
2273 c2 = get_cycles(); |
2275 ecrt_master_run(master); |
2274 ecrt_domain_queue(domain1); |
2276 c3 = get_cycles(); |
2275 c3 = get_cycles(); |
2277 ecrt_master_send(master); |
2276 ecrt_master_send(master); |
2278 c4 = get_cycles(); |
2277 c4 = get_cycles(); |
2279 \end{lstlisting} |
2278 \end{lstlisting} |
2280 |
2279 |
2281 Between each call of an interface function, the CPU timestamp counter is read. |
2280 Between each call of an interface function, the CPU timestamp counter is read. |
2282 The counter differences are converted to \micro\second\ with help of the |
2281 The counter differences are converted to \micro\second\ via the |
2283 \lstinline+cpu_khz+ variable, that contains the number of increments per |
2282 \lstinline+cpu_khz+ variable, that contains the number of counts per |
2284 \milli\second. |
2283 \milli\second\ for the IA32 architecture's timestamp counter. |
2285 |
2284 |
2286 For the actual measuring, a system with a \unit{2.0}{\giga\hertz} CPU was used, |
2285 For the actual measurement, a system with a \unit{2.0}{\giga\hertz} CPU was |
2287 that ran the above code in an RTAI thread with a period of |
2286 used, that ran the above code in an RTAI thread with a period of |
2288 \unit{100}{\micro\second}. The measuring was repeated $n = 100$ times and the |
2287 \unit{1}{\milli\second}. The measurement was repeated $n = 10000$ times and |
2289 results were averaged. These can be seen in table~\ref{tab:profile}. |
2288 the results were averaged. These can be seen in table~\ref{tab:profile}. |
2290 |
2289 |
2291 \begin{table}[htpb] |
2290 \begin{table}[htpb] |
2292 \centering |
2291 \centering |
2293 \caption{Profiling of an Application Cycle on a \unit{2.0}{\giga\hertz} |
2292 \caption{Application Cycle on a \unit{2.0}{\giga\hertz} Processor} |
2294 Processor} |
|
2295 \label{tab:profile} |
2293 \label{tab:profile} |
2296 \vspace{2mm} |
2294 \vspace{2mm} |
2297 \begin{tabular}{l|r|r} |
2295 \begin{tabular}{l|r|r} |
2298 Element & Mean Duration [\second] & Standard Deviancy [\micro\second] \\ |
2296 |
|
2297 Function & |
|
2298 $\mu(\Delta t)$ [\micro\second] & |
|
2299 $\sigma(\Delta t)$ [\micro\second] \\ |
2299 \hline |
2300 \hline |
2300 \textit{ecrt\_master\_receive()} & 8.04 & 0.48\\ |
2301 |
2301 \textit{ecrt\_domain\_process()} & 0.14 & 0.03\\ |
2302 \lstinline+ecrt_master_receive()+ & 6.13 & 1.11\\ |
2302 \textit{ecrt\_master\_run()} & 0.29 & 0.12\\ |
2303 |
2303 \textit{ecrt\_master\_send()} & 2.18 & 0.17\\ \hline |
2304 \lstinline+ecrt_domain_process()+ & $<$ 0.01 & 0.07\\ |
2304 Complete Cycle & 10.65 & 0.69\\ \hline |
2305 |
|
2306 \lstinline+ecrt_domain_queue()+ & $<$ 0.01 & 0.17\\ |
|
2307 |
|
2308 \lstinline+ecrt_master_send()+ & 1.15 & 0.65\\ \hline |
|
2309 |
|
2310 Complete Cycle & 7.28 & 1.31\\ \hline |
|
2311 |
2305 \end{tabular} |
2312 \end{tabular} |
2306 \end{table} |
2313 \end{table} |
2307 |
2314 |
2308 It is obvious, that the functions accessing hardware make up the |
2315 It is obvious, that the functions accessing hardware make up the lion's share. |
2309 lion's share. The \textit{ec\_master\_receive()} executes the ISR of |
2316 The \lstinline+ec_master_receive()+ executes the ISR of the Ethernet device |
2310 the Ethernet device, analyzes datagrams and copies their contents into |
2317 driver, dissects the received frame and copies the datagram contents into the |
2311 the memory of the datagram objects. The \textit{ec\_master\_send()} |
2318 memory of the corresponding datagram objects. The \lstinline+ec_master_send()+ |
2312 assembles a frame out of different datagrams and copies it to the |
2319 function assembles a frame from different datagrams and copies it to the |
2313 hardware buffers. Interestingly, this makes up only a quarter of the |
2320 hardware buffers. The functions that only operate on the masters internal data |
2314 receiving time. |
2321 structures are very fast ($\Delta t < \unit{1}{\micro\second}$). |
2315 |
2322 |
2316 The functions that only operate on the masters internal data structures are |
2323 For a realtime cycle makes up about \unit{10}{\micro\second}, the resulting |
2317 very fast ($\Delta t < \unit{1}{\micro\second}$). Interestingly the runtime of |
2324 theoretical frequency could be up to $1 / \unit{10}{\micro\second} = |
2318 \textit{ec\_domain\_process()} has a small standard deviancy relative to the |
2325 \unit{100}{\kilo\hertz}$. For two reasons, this frequency keeps being |
2319 mean value, while this ratio is about twice as big for |
2326 theoretical: |
2320 \textit{ec\_master\_run()}: This probably results from the latter function |
|
2321 having to execute code depending on the current state and the different state |
|
2322 functions are more or less complex. |
|
2323 |
|
2324 For a realtime cycle makes up about \unit{10}{\micro\second}, the theoretical |
|
2325 frequency can be up to \unit{100}{\kilo\hertz}. For two reasons, this frequency |
|
2326 keeps being theoretical: |
|
2327 |
2327 |
2328 \begin{enumerate} |
2328 \begin{enumerate} |
2329 |
2329 |
2330 \item The processor must still be able to run the operating system between the |
2330 \item The processor must still be able to run the operating system between the |
2331 realtime cycles. |
2331 realtime cycles. |
2355 \end{enumerate} |
2355 \end{enumerate} |
2356 |
2356 |
2357 Both times are difficult to determine. The first reason is, that the |
2357 Both times are difficult to determine. The first reason is, that the |
2358 interrupts are disabled and the master is not notified, when a frame is sent |
2358 interrupts are disabled and the master is not notified, when a frame is sent |
2359 or received (polling would distort the results). The second reason is, that |
2359 or received (polling would distort the results). The second reason is, that |
2360 even with interrupts enabled, the time from the event to the notification is |
2360 even with interrupts enabled, the interrupt latency (i.\,e.\ the time from the |
2361 unknown. Therefore the only way to confidently determine the bus cycle time is |
2361 event to the notification) is unknown. Therefore the only way to confidently |
2362 an electrical measuring. |
2362 determine the bus cycle time is an electrical measurement. |
2363 |
2363 |
2364 Anyway, the bus cycle time is an important factor when designing realtime |
2364 Anyway, the bus cycle time is an important factor when designing realtime |
2365 code, because it limits the maximum frequency for the cyclic task of the |
2365 applications, because it limits the maximum frequency for the cyclic task. In |
2366 application. In practice, these timing parameters are highly dependent on the |
2366 practice, these timing parameters are highly dependent on the hardware and |
2367 hardware and often a trial and error method must be used to determine the |
2367 often a trial and error method must be used to determine the limits of the |
2368 limits of the system. |
2368 system. |
2369 |
2369 |
2370 The central question is: What happens, if the cycle frequency is too high? The |
2370 An essential question is: What happens, if the cycle frequency is too high? |
2371 answer is, that the EtherCAT frames that have been sent at the end of the |
2371 The EtherCAT frames that have been sent at the end of the cycle could have |
2372 cycle are not yet received, when the next cycle starts. First this is noticed |
2372 been not yet received when the next cycle starts. First this is noticed by the |
2373 by \textit{ecrt\_domain\_process()}, because the working counter of the |
2373 domain, because the working counters of the datagrams are zero. This can be |
2374 process data datagrams were not increased. The function will notify the user |
2374 queried in realtime context via the application interface and is output via |
2375 via Syslog\footnote{To limit Syslog output, a mechanism has been implemented, |
2375 Syslog\footnote{To limit Syslog output, a mechanism has been implemented, that |
2376 that outputs a summarized notification at maximum once a second.}. In this |
2376 outputs a summarized notification at maximum once a second.}. In this case, |
2377 case, the process data keeps being the same as in the last cycle, because it |
2377 the process data keeps being the same as in the last cycle, because it is not |
2378 is not erased by the domain. When the domain datagrams are queued again, the |
2378 erased by the domain. When the domain datagrams are queued again, the master |
2379 master notices, that they are already queued (and marked as sent). The master |
2379 notices, that they are already queued (and marked as sent). The master will |
2380 will mark them as unsent again and output a warning, that datagrams were |
2380 mark them as unsent again and output a warning, that datagrams were |
2381 ``skipped''. |
2381 ``skipped''. |
2382 |
2382 |
2383 On the mentioned \unit{2.0}{\giga\hertz} system, the possible cycle frequency |
2383 On the mentioned \unit{2.0}{\giga\hertz} system, the possible cycle frequency |
2384 can be up to \unit{25}{\kilo\hertz} without skipped frames. This value can |
2384 can be up to \unit{25}{\kilo\hertz} without skipped frames. This value is |
2385 surely be increased by choosing faster hardware. Especially the RealTek |
2385 highly dependant on the chosen hardware. |
2386 network hardware could be replaced by a faster one. Besides, implementing a |
|
2387 dedicated ISR for EtherCAT devices would also contribute to increasing the |
|
2388 latency. These are two points on the author's to-do list. |
|
2389 |
2386 |
2390 %------------------------------------------------------------------------------ |
2387 %------------------------------------------------------------------------------ |
2391 |
2388 |
2392 \chapter{Installation} |
2389 \chapter{Installation} |
2393 \label{sec:installation} |
2390 \label{sec:installation} |