Difference between revisions of "BigFix Server Client Report Ingestion Stalls at Midnight"
(→POTENTIAL Solution) |
|||
Line 1: | Line 1: | ||
= Summary = | = Summary = | ||
+ | |||
+ | This has been resolved. There was a Scheduled SQL Job that was indexing the Fixlets and it was causing the server to stall each night. We disabled that Scheduled Job and it is no longer a problem. | ||
We have been seeing a problem where every night, starting at midnight the all the clients appear to go "off line" (Last Report Time's all show no later than 12:00 Midnight) for a period of about an hour, then they start to report again. | We have been seeing a problem where every night, starting at midnight the all the clients appear to go "off line" (Last Report Time's all show no later than 12:00 Midnight) for a period of about an hour, then they start to report again. |
Revision as of 14:17, 28 June 2022
Summary
This has been resolved. There was a Scheduled SQL Job that was indexing the Fixlets and it was causing the server to stall each night. We disabled that Scheduled Job and it is no longer a problem.
We have been seeing a problem where every night, starting at midnight the all the clients appear to go "off line" (Last Report Time's all show no later than 12:00 Midnight) for a period of about an hour, then they start to report again.
If I look on the server, the BufferDir folder is full of client report files (several hundred files), but for some reason FillDB is not importing the client reports.
I have verified that all of the scheduled tasks (remove duplicate computers, etc) and scheduled SQL jobs (Re-Index DB currently temporarily disabled due to this issue) are NOT scheduled for midnight. Neither the server nor the database are currently being backed up. The server is part of a DSA pair structure. The only other thing I could think of was scheduled Web Reports, but those come from the WebReport database and run at random times of the day, not BFEnterprise. Same with SOAP queries.
To me, the symptoms look like some process somewhere is placing an Exclusive Lock on the BESEnterprise database for about an hour. We have asked our DB team to look at the server but they say that they can't see what is causing the issue.
We have Actions that run overnight (11:30pm to 6:00am) and the Console Operators that need to monitor these tasks are "blind" from midnight to about 1am each night.
Research
- https://docs.microsoft.com/en-us/sql/relational-databases/extended-events/determine-which-queries-are-holding-locks?view=sql-server-ver15
- https://stackoverflow.com/questions/8749426/how-to-find-out-what-is-locking-my-tables
POTENTIAL Solution
Taken from https://stackoverflow.com/questions/8749426/how-to-find-out-what-is-locking-my-tables
--Create Procedure WhoLock --AS set nocount on if object_id('tempdb..#locksummary') is not null Drop table #locksummary if object_id('tempdb..#lock') is not null Drop table #lock create table #lock ( spid int, dbid int, objId int, indId int, Type char(4), resource nchar(32), Mode char(8), status char(6)) Insert into #lock exec sp_lock if object_id('tempdb..#who') is not null Drop table #who create table #who ( spid int, ecid int, status char(30), loginame char(128), hostname char(128), blk char(5), dbname char(128), cmd char(16) -- , request_id INT --Needed for SQL 2008 onwards -- ) Insert into #who exec sp_who Print '-----------------------------------------' Print 'Lock Summary for ' + @@servername + ' (excluding tempdb):' Print '-----------------------------------------' + Char(10) Select left(loginame, 28) as loginame, left(db_name(dbid),128) as DB, left(object_name(objID),30) as object, max(mode) as [ToLevel], Count(*) as [How Many], Max(Case When mode= 'X' Then cmd Else null End) as [Xclusive lock for command], l.spid, hostname into #LockSummary from #lock l join #who w on l.spid= w.spid where dbID != db_id('tempdb') and l.status='GRANT' group by dbID, objID, l.spid, hostname, loginame Select * from #LockSummary order by [ToLevel] Desc, [How Many] Desc, loginame, DB, object Print '--------' Print 'Who is blocking:' Print '--------' + char(10) SELECT p.spid ,convert(char(12), d.name) db_name , program_name , p.loginame , convert(char(12), hostname) hostname , cmd , p.status , p.blocked , login_time , last_batch , p.spid FROM master..sysprocesses p JOIN master..sysdatabases d ON p.dbid = d.dbid WHERE EXISTS ( SELECT 1 FROM master..sysprocesses p2 WHERE p2.blocked = p.spid ) Print '--------' Print 'Details:' Print '--------' + char(10) Select left(loginame, 30) as loginame, l.spid, left(db_name(dbid),15) as DB, left(object_name(objID),40) as object, mode , blk, l.status from #lock l join #who w on l.spid= w.spid where dbID != db_id('tempdb') and blk <>0 Order by mode desc, blk, loginame, dbID, objID, l.status